There is a paradox in expertise: the more you know, the more likely you are to be confidently wrong.

The PhD Forecaster Loses to the Naïve Baseline

Imagine a climate economist with a PhD, twenty years in the field, published in peer-reviewed journals, consulted by central banks. The economist builds an elaborate model to forecast GDP growth three years out. The model incorporates hundreds of variables, calibrated on historical data, refined through iterations of backtesting.

Now imagine a graduate student with no expertise who says: "Next year's GDP will be 95% of this year's GDP."

Which forecast is more accurate?

Across many tests, measured against actual outcomes, the graduate student wins. The economist's elaborate model adds confidence without adding accuracy. In fact, the economist often performs worse than the naïve baseline.

This is not a criticism of the individual economist or the quality of their training. It is a structural property of domains where the future does not reliably resemble the past. In those domains, expertise in the past does not translate to accuracy about the future.

The economist feels more confident because their training demands precision. Uncertainty is the enemy of being publishable, fundable, and promotable. So the economist's confidence rises. Accuracy does not rise with it.

Why Expertise Inflates Confidence in Extremistan

Here's the mechanism: expertise inflates confidence without improving accuracy in the tails.

The economist knows the past. The economist can construct post-hoc explanations for why the past happened the way it did. The economist can debate the relative importance of monetary policy, fiscal stimulus, technological change, and demographics.

All of that knowledge is real. None of it predicts the future when the future contains events outside the training distribution.

What the economist doesn't know—what nobody can know—is the next structural break. The next pandemic. The next financial panic. The next geopolitical rupture. The next technological disruption. These events are invisible in the models because they are outside the category the models were trained on.

In Extremistan, where outcomes are dominated by rare, large events, the accuracy that matters is accuracy at the tail. And expertise at predicting the mode (the most likely outcome) does not translate to accuracy at predicting the tail (the rare, extreme outcome).

The economist's confidence rises because the model feels good in sample. The accuracy at the tail stays low because the tail is inherently unpredictable within any finite data set.

The Famous Forecaster Problem

Philip Tetlock's twenty-year study of 284 political forecasters collected 28,000 predictions and measured them against outcomes. The result was stark: the average expert performed barely better than chance. More stunning: the more famous the expert, the worse their calibration.

Fame goes to bold, confident predictions. Bold predictions tend to miss. So the people most famous for their forecasting are, on average, the worst forecasters.

This pattern is not unique to political forecasting. It recurs in economics, geopolitics, technology forecasting, and financial markets. The confident expert is the one on television. The cautious expert is ignored. The television expert is more likely to be wrong.

Why? Because cautiousness does not make for compelling television, compelling consulting reports, or compelling op-eds. Confidence does. The market for expertise rewards confidence, not accuracy. So expertise optimizes for confidence.

The Surgeon's Confidence vs. Reality

I highlighted surgeons in the epistemic arrogance piece because the stakes are life and death. But it's worth returning to here because it shows the exact mechanism.

Surgeons have genuine expertise. They've performed thousands of procedures. They have real skill. Their hands are trained. Their judgment about which patients are candidates for surgery is valuable.

But surgeons systematically underestimate post-operative mortality. When surgeons estimate that "95% of my patients survive this procedure," actual historical data shows roughly 80% survival.

Why? Because the surgeon is anchoring on vivid cases in recent memory—cases that went well—rather than on base rates. Because the surgeon has confidence in their own skill, which is real, but treats that confidence as a guarantee of outcome, which it is not. Because complications are rare enough that the surgeon doesn't consciously account for them in the tail.

The surgeon's estimate comes from genuine expertise. The surgeon's estimate is still confidently too optimistic.

The Famous Expert Track Record

When was the last time you saw a famous forecaster acknowledge that their previous forecasts were wildly inaccurate?

It is rarer than you might expect.

Instead, what you get is: "I was directionally correct, even if my timing was off." Or: "I identified the key risks, even if the magnitude was different." Or: "That outcome was within my stated range of possibility."

All of these are unfalsifiable post-hoc narratives. The industry's standard defense—"I was almost right"—cannot be disconfirmed because "almost right" can accommodate almost any outcome.

The only useful standard is the forecast stated in advance, with an explicit error bar, compared against the actual outcome. Almost nobody in domains requiring heavy forecasting meets this standard.

Which is why the famous expert's track record, when you actually investigate it, is often indistinguishable from chance.

What to Do About It

Here's the practical rule: before trusting an expert's prediction, ask for their track record against naive baselines.

"Naive baseline" means: last year's number, random walk, or simple trend extrapolation.

If the expert's track record beats the naive baseline meaningfully and consistently, the expert might be worth listening to.

If the expert's track record is indistinguishable from the naive baseline, or worse, treat the expert as a skilled storyteller, not an accurate forecaster. Pay them nothing.

Most experts in Extremistan domains will not meet this test. Most have never been systematically measured against naive baselines. Most have never been required to state error bars. Most have never been held to advance predictions rather than post-hoc narratives.

Those are your clues that the confidence is doing the work, not the accuracy.