I want to tell you when to trust an expert and when to treat them as a compelling storyteller wearing a credential.
The distinction is simple: it depends on whether the expert's domain allows expertise to actually exist.
Which Domains Produce Real Experts
Some domains have the conditions necessary for genuine expertise:
Chess: Feedback is immediate (you win or lose); rules are stable; positions are complex enough to reward deep knowledge; thousands of games create a vast training set. Grandmasters are real experts. Magnus Carlsen is genuinely better than a random person, measurably, consistently.
Dentistry: Feedback is relatively rapid (you can see if a filling worked); the underlying biology is stable; you can learn from thousands of cases; the domain has been stable for decades. Good dentists are demonstrably better than bad ones at fixing teeth.
Physics: Feedback is precise (experiments confirm or refute theories); rules are stable (gravity works the same way today as yesterday); predictions are testable and falsifiable. A physicist's expertise is genuine.
Accounting: Rules are mostly fixed; feedback is clear (the numbers reconcile or they don't); complexity is high enough to require training; the domain has been stable. An accountant's expertise is real.
Aviation: Feedback is stark (the plane lands safely or it doesn't); rules are stable (aerodynamics don't change); engineering builds on centuries of accumulated knowledge; pilots train thousands of hours on systems they deeply understand. Commercial aviation safety is built on genuine expertise.
All of these domains share properties:
- Tight feedback loops (you know quickly if you were right)
- Stable rules (the rule today is the rule tomorrow)
- Sufficient complexity to reward deep knowledge
- High volume of cases to learn from
These are the domains where expertise compounds.
Which Domains Produce Empty Suits
Other domains have exactly opposite properties:
Stockbrokers: Feedback is delayed years or never (a recommended stock might outperform for years then crash); rules change constantly (market regimes, correlations, volatility); complexity is high but not in ways that reward specific expertise; selection effects hide poor forecasters. On average, stockbrokers underperform passive indexes.
Macroeconomists: Feedback is delayed or confounded (you can never cleanly test a prediction because the world changed between forecast and outcome); rules are unstable (monetary policy regimes change, structural factors shift); the domain is Extremistan (dominated by rare large events not in training data); credit-claiming for lucky forecasts is universal. A macroeconomist's accuracy is indistinguishable from chance once naive baselines are used.
Political forecasters: Feedback is delayed years and confounded by intervening events; rules are unstable (political coalitions, technologies, demographics shift); selection effects favor bold prediction over accuracy; famous forecasters are often the worst. Philip Tetlock's study found they barely beat random guessing.
Intelligence analysts: Feedback is delayed or missing (classified information from after the forecast may never be released); rules are deliberately adversarial (people try to fool you); prediction is systematically overconfident; career success is decoupled from forecast accuracy. No evidence suggests intelligence analysts consistently outperform naive baselines.
Court judges: Feedback is absent (you don't know if your parole decision was correct; the person either does or doesn't reoffend, and confounding variables make causation impossible to trace); rules are complex but not stable (sentencing guidelines change); individual judgment varies wildly for identical cases. No evidence of judicial expertise in predicting recidivism.
Financial advisors: Feedback is delayed, confounded, and blamed on external factors (the advisor was right about direction, just wrong about timing; the portfolio underperformed because markets were weird); rules are unstable; selection effects hide poor advisors. Evidence consistently shows that advisors underperform simple passive strategies.
All of these domains share opposite properties:
- Broken or delayed feedback loops
- Unstable rules or rules that change between training and application
- Extremistan dynamics (dominated by rare large events)
- Strong incentives to claim credit for successes and disclaim responsibility for failures
These are domains where expertise cannot compound, yet experts proliferate because demand for expertise exceeds supply of accurate forecasting.
The Gold Standard: Tetlock's Twenty-Year Study
Philip Tetlock, a political scientist at Berkeley, did something radical: he collected 28,000 predictions from 284 experts over twenty years and measured them against outcomes.
The results should have destroyed the consulting industry:
- The average expert performed barely better than chance.
- The average expert performed worse than a simple "last year's number continues" baseline.
- More famous experts performed worse than less famous ones.
- Experts who were wrong refused to update their frameworks; they produced post-hoc narratives explaining why they were "almost right."
Famous forecasters, who had built reputations on bold and confident predictions, were the least accurate.
Why? Because fame goes to confidence, and confidence in Extremistan domains is inversely correlated with accuracy.
A cautious forecaster who says "I'm not sure, but probably something like this" gets ignored and never becomes famous. A confident forecaster who says "This is definitely going to happen" gets a book contract and a consulting fee. The confident forecaster is also usually wrong—but by then they've moved on to a new prediction.
The Track Record Test
Here's my practical test before trusting any expert's forecast:
Ask for their track record against naive baselines.
Naive baseline = last year's number, random walk, or simple trend extrapolation.
If the expert consistently beats the naive baseline, the expert might be worth listening to.
If the expert's track record is indistinguishable from the naive baseline, treat the expert as a skilled storyteller, not a forecaster. Pay them nothing.
Most experts in Extremistan domains will fail this test. Most have never been systematically measured against naive baselines. Most have never been required to state error bars on their predictions. Most have never been held accountable for advance predictions rather than post-hoc narratives.
The "I Was Almost Right" Defense
After missing a forecast, the expert says: "I was directionally correct, even if my timing was off."
Or: "I identified the main risks, even if the magnitude was different."
Or: "That outcome was within my stated range of possibility."
This is the industry's universal defense, and it is unfalsifiable. Any outcome can be retrofitted into "almost right."
As a test of expertise, it asks nothing and proves nothing.
When the Expert's Domain Actually Has Expertise
Now contrast this with a domain where expertise actually exists.
Your oncologist makes a diagnosis based on pathology, imaging, and clinical presentation. You measure accuracy: did the patient respond to the indicated treatment? Accuracy matters. Expertise is real.
Your accountant reconciles your books. You measure accuracy: do the numbers add up? Do you owe what's stated? Accuracy matters. Expertise is real.
Your pilot lands your plane. You measure accuracy: did the plane land safely? Expertise is real.
In these domains, forecasts are quickly falsified. Responsibility is clear. Poor performers do not survive as experts because the consequences are immediate.
In stockbroking, macroeconomics, political forecasting, and intelligence analysis, bad forecasters do not disappear. They reframe their failures as "almost right," find a new client, and continue earning.
How to Separate Real Experts from Credential-Wearing Entertainers
Ask these questions:
-
Does the domain have tight feedback loops? Can I know quickly if the expert was right? If feedback is delayed or absent, the domain cannot produce expertise.
-
Are the rules stable? Do the underlying laws stay the same, or do they change between the expert's training period and application? If they change, expertise in the training set does not transfer.
-
What is the expert's track record against a naive baseline? Have they been systematically measured? Or are they just repeating the "I was almost right" narrative?
-
Does the expert have skin in the game? Do they bear the downside of being wrong? Or is their downside bounded while their upside is unlimited? If they don't pay for failure, discount their confidence.
-
Are they the same people who were wrong last time? If yes, why are they being consulted on this decision?
Most of the people we call experts will fail at least three of these tests. That's not a reason to never listen to anyone. It's a reason to listen skeptically—treating predictions as one input among many, not as truth-claims from people who have demonstrated accurate forecasting.
The Scandal of Prediction
Why does the expert industry persist despite documented inaccuracy? Because:
-
Organizations need plans, and plans need numbers. The numbers might be wrong, but something is needed, so forecasters produce numbers.
-
Forecasters face no consequences for being wrong. They are paid whether accurate or not. The client's loss is not their loss.
-
The "almost right" defense is unfalsifiable. Any outcome can be rationalized as consistent with the forecast.
This structural problem is not going away. It's baked into how organizations operate. But understanding it lets you discount expert claims appropriately and make better decisions yourself.
Before you trust an expert, ask: does this domain actually produce expertise, or does it produce confident people? The distinction is everything.