When Systems Fail

Peer review no longer does what most people think it does. Instead of reliably sorting signal from noise, it often protects insiders, rewards narrative alignment, and quietly discourages replication or invalidation. COVID simply made those dynamics visible to a much wider audience.

Contrary to the cartoon, we have:

Science is the belief in the ignorance of experts.

— Richard Feynman

What begins in educational institutions often ends up in society.

— Thomas D. Zweifel

Bad science begets bad policy.

— Roger Pielke, Jr.

After publishing Peer Review Is a Guild (and its Daily Sceptic version: “How to Fix Science”), many readers echoed encountering this problem. An AI entrepreneur had been circling the same question: “how do we tell what’s actually true once money, careers, and institutions get involved?”

His concern wasn’t journal politics—it was error accumulation. How wrong information propagates, how long it persists, and how expensive it becomes once it’s embedded in institutions. Coincidently I had written on this topic three years ago when (somewhat oxymoronically named) “OpenAI” enforced Fauci’s (wrongthink) dogma into ChatGPT – which I renamed “ChatCCP“– for its censorious thoughtcrime blockades.

We coincide in this regard as well, he stated “(his) interest in finding the truth and helping AI systems train on and understand what’s real rather than Wikipedia and Reddit – avoiding the turbocharged vicious circle that ChatGPT can propagate as it starts looking at more of its own (re-)generations (of error).

“I study wrongology—the amount of information that’s wrong in society and how it starts and propagates. I realized there’s far more misinformation and ambiguity about the past than about the future—probably 100:1. There’s no money in replication, underwhelming results, career-limiting findings, or invalidation. We have the best science money can buy, and it does. Making bets on the future helps reveal hidden information. But we don’t have a way to separate truth from popularity or gamesmanship in the existing literature. As a result, trillions of dollars are spent on claims that were never robust to begin with. Grantmakers don’t have a reliable way to tell whether their money is doing what they think it is. AI trains on garbage and is tested on things it already knows. It needs better training data and better benchmarks. I want to build a structured, adversarial knowledge market—a way to resolve past and present uncertainty using real stakes and open challenge.I’ve designed an initial system and want to test it with a small group.”

Here’s corroboration: when “science” and politics intersected and a favored narrative became the only “peer review” -acceptable pathway.

The Common Insight

The shared realization is straightforward, even uncomfortable: we are exceptionally good at producing information, yet remarkably poor at deciding which information survives contact with reality. Peer review does not reliably solve this problem. More often, it freezes it in place. There is little incentive to replicate results, scant reward for invalidation, and almost no penalty for being confidently wrong. Over time, there is no cumulative public record that distinguishes good judgment from bad—only résumés, citations, and journal mastheads standing in as proxies for accuracy, despite how weakly they correlate with it.

Markets, games, and competitive systems operate differently. Not because their participants are wiser or more virtuous, but because outcomes are visible and records accumulate. Claims are tested repeatedly, errors are exposed, and performance becomes legible over time. Signal emerges not through authority, but through iteration.

The idea—arrived at independently by people approaching the problem from very different directions—is to borrow that logic for knowledge itself. Not to turn science into gambling. Not to install a new priesthood. But to create a system in which claims are testable, challengeable, and legible in the open, so that accuracy can finally compete on equal footing with prestige.

Why Share This Now?

The point of sharing this is not to announce a finished product or to declare a unified theory. It is simply to see who else may be circling the same question. If people working in medicine, artificial intelligence, economics, and entirely outside institutions are independently sketching similar ideas, then the next step is not debate but experimentation. That means small pilots, low-stakes tests, and claims that are safe enough to explore yet real enough to matter. Think of it less as a revolution and more as a sandbox—an open space to see what works before anything hardens into doctrine.

How to Get Involved

If this resonates and you’d like to help think through, test, or prototype ways of openly adjudicating claims—whether for curiosity, rigor, or eventual commercial use—say so in the comments or reach out. This is early, informal, and exploratory. The point is not to agree on everything, but to start trying things that the current system doesn’t allow.

Further Reading

https://www.cuttingthroughthenoise.net/peer-review

https://www.cuttingthroughthenoise.net/intro

https://www.cuttingthroughthenoise.net/science

https://www.cuttingthroughthenoise.net/debate

addendum:

Here are some topic areas as initial “sandbox” to test gamifying peer review. These are intentionally non-political, non-COVID, non-identity topics—good for testing process rather than ideology.

Medicine / Health

Do vitamin D supplements meaningfully reduce fracture risk in older adults?

Do routine annual physicals improve long-term outcomes?

Are statins beneficial for primary prevention in low-risk populations?

Does physical therapy outperform rest for common low-back pain?

Nutrition / Metabolism

Are seed oils independently associated with inflammation in humans?

Does intermittent fasting improve metabolic markers independent of weight loss?

Are multivitamins beneficial in populations without deficiency?

Public Policy / Economics

Do minimum-wage increases reduce employment at the local level?

Does congestion pricing reduce traffic over a 5-year horizon?

Do cash-transfer programs improve long-term earnings?

Science / Methods

How often do highly cited studies replicate within effect-size bounds?

Do pre-registered studies show smaller effect sizes than exploratory ones?

Are meta-analyses systematically biased by publication selection?

Each of these topics lends itself naturally to experimentation because they share a few important characteristics. The claims can be stated clearly, the underlying data are generally available, and there are genuine competing interpretations rather than purely rhetorical disagreements. They also have real-world relevance without carrying the kind of social or political toxicity that derails discussion before it begins. Taken together, they make ideal test cases for learning how open adjudication might actually work in practice before attempting anything at larger scale