methodologyeducationai

Why most AI stock-picking tools are lying — and the one question that exposes them

Almost every 'AI stock predictor' advertises an accuracy number you can't verify. There's a single question that separates the honest tools from the marketing fronts — and most of the market fails it. Here's the question, why it works, and our own real numbers (including the markets where our model is below a coin flip).

By William Wu11 March 20266 min read

Why most AI stock-picking tools are lying — and the one question that exposes them

Search "AI stock prediction" and you'll drown in numbers. 87% accuracy. 92% win rate. "★★★★★ — our AI returned 340% last year." Every one of them has a confident chart, five gold stars, and a checkout button.

Here's the uncomfortable truth: almost none of those numbers can be verified, and most of them are constructed to be unverifiable on purpose.

I build one of these tools. I'm going to tell you exactly how the magic trick works, and give you a single question that exposes it in about ten seconds.

The one question

"Can I see every prediction you've ever made — including the losses — timestamped before the outcome was known?"

That's it. That's the whole test.

A tool that's telling the truth can answer "yes, here's the public log" instantly. A tool that's selling you a fantasy will give you one of these instead:

"Our results are based on rigorous backtesting." (Backtests are not predictions. More on this below.)
"Past performance doesn't guarantee future results." (True, but notice they didn't show you the past performance — just the disclaimer.)
A wall of cherry-picked screenshots of winning trades. (Where are the losing ones?)
"Our proprietary algorithm is confidential." (The results don't have to be confidential to protect the algorithm.)

If they can't show you a complete, timestamped, loss-included track record, the accuracy number is marketing. Full stop.

How the trick works

There are three standard ways to manufacture an impressive accuracy number. None of them involve actually predicting anything.

1. Backtest the past until it confesses

A backtest asks: "if I had run this strategy in the past, how would it have done?" Done honestly, it's useful. Done the lazy way — which is the industry default — you take years of historical data, search for the parameters that would have worked best in hindsight, and report those.

If you torture a price dataset long enough, it will confess to anything. There are always some settings that would have made you rich. They almost never keep working. We wrote a whole piece on this: walk-forward validation and why most backtests lie.

A backtest is a story about the past. A prediction is a claim about the future, made before the future arrives. They are not the same thing, and a tool that only shows you backtests has never actually been tested.

2. Report the survivors

Make 1,000 predictions. Quietly delete the 600 that lost. Screenshot the 400 that won. Now you have a "67% win rate" and a folder of beautiful green candles for your ads.

This is trivially easy and almost impossible to catch from the outside — unless the tool committed to a public, immutable log before the outcomes were known. Which is exactly why most of them don't.

3. Pick the metric that flatters you

"92% accuracy" — at what? Accuracy on what universe, over what horizon, against what baseline?

If a model only ever predicts "the market will go up," it'll be "right" about 53% of the time in a bull market — not because it has skill, but because markets drift up. Quote that as "53% accuracy" and it sounds like skill. It isn't. The honest comparison is always against a dumb baseline: a coin flip, or "always bullish." If a tool won't show you the baseline, the number is meaningless.

Our real numbers — including the embarrassing ones

I'd be a hypocrite if I wrote all that and didn't show you ours. So here's the honest, uncomfortable version, pulled from our public verified log as of writing:

Across all 16 markets we cover: about 49.7% directional accuracy on the verified record — a hair under a coin flip on the blended number. Not 90%. Not five stars. A coin flip, in the open, where you can check it.
Our best markets — Canada and US large-caps — sit around 53%. Genuinely above coin-flip, and the segments our methodology was built around. "Around 53%," published, beats "92%," hidden, every single time.
Several of our markets are below 50%. Taiwan, for one, is around 47% so far — the model is genuinely bad there, and I'll happily tell you why in a separate piece. We don't quietly drop the weak markets from the average. They're in the blended number, dragging it down, on purpose.
High-confidence calls currently perform worse than low-confidence ones — a sign our confidence scoring is over-fit on the small set of cases where the model commits hard. We're working on it, and saying so out loud.

Now — why on earth would I publish that? My headline blend is essentially a coin flip. Several of my markets are below random. No marketing team on Earth would let me write this paragraph; they'd want a fake "★★★★★ 90% accuracy" badge instead.

Because that paragraph is the product. Anyone can fit XGBoost to Yahoo Finance data in an afternoon. What's rare — almost non-existent in retail finance — is a tool that logs every call before the outcome, keeps the losses, publishes the markets where it's weak, and scores itself against reality in public. The number isn't the moat. The honesty is. A coin-flip you can verify is worth more than a miracle you can't.

A 53% directional edge on Canadian and US large-caps, compounded over many independent decisions, is genuinely useful as one input among many. It is not a money printer, and the same model is below random on several markets. Both things are true, and you deserve to know both — wins and losses, openly — before you trust any of it. That's why every call we make is at /predictions, scored against what actually happened.

What to do with this

Next time a tool quotes you an accuracy number, run the test:

Ask for the complete, timestamped, loss-included log. No log, no trust.
Ask what baseline they beat. "Accuracy" without a coin-flip comparison is theatre.
Ask which markets and horizons it's bad at. Any honest model has weak spots. A tool that claims to be good everywhere — five stars, 90%, no losses — is lying everywhere.

If a tool passes all three, you've found a rare one. It still won't make you rich, and it can still be wrong in size — but at least the number means something.

We try to pass our own test on every page. We read our signals as Bullish, Neutral, or Bearish — directional research, never a "Buy" or "Sell" order, and never a guarantee. Some of those calls will be wrong. We say so, with receipts, at /predictions.

See the evidence for yourself — download the full resolved-prediction dataset, read the live public self-audit (hit-rate confidence intervals, live-vs-backfill split), inspect every model card, or run the research tools on your own data. No hype, just the receipts.

This article is educational content about model evaluation. It is not financial advice, and nothing here is a recommendation to buy or sell any security. Trading Agent is a quantitative research tool operated by WU Capital Limited (New Zealand). See our Methodology page for technical detail.

All insights