Walk-forward validation — why most stock-prediction backtests lie
The 'tested on 5 years of data, returned 70% a year' pitch you see in finance Twitter ads is almost always smoke. Here's what's actually happening, and how a different way of testing — walk-forward — kills most of those numbers down to honest.

If you've spent any time on finance Twitter or YouTube, you've seen the pitch.
"I backtested this strategy on 5 years of data. Returned 70% per year."
Maybe it's a moving-average crossover. Maybe it's a fancy "AI signal". The result is always the same: a beautiful upward-sloping equity curve, a few exotic indicators, and a paid newsletter at the end.
I want to walk through why those numbers are almost always smoke — and what we do differently at Trading Agent.
The honest version of "backtesting"
A backtest is supposed to answer a simple question: if I had run this strategy in the past, how would it have done?
The right way: pretend you don't know the future, decide trades using only the data available at the time, and see how it turns out.
The lazy way (and the way 90% of "AI prediction" pitches do it): take the entire dataset, find the parameters that would have worked best in hindsight, and report those numbers.
If you torture a stock-price dataset long enough, it will confess to anything. There are always some parameters that would have made you rich. The question is whether those parameters will work going forward — and almost always, they won't.
There are three common ways this goes wrong:
1. Look-ahead bias
The model accidentally uses information from the future. Maybe a feature accidentally includes tomorrow's close price. Maybe the indicator is calculated over the whole dataset before being split. The model looks like a genius — because it can see the answer.
This sounds dumb but is staggeringly common in retail strategies. Anyone who's coded their own indicator in Python has done it at least once.
2. Overfitting
You try 50 strategies. You report the best one. The other 49 lost money. Statistically, even pure noise will produce at least one strategy that looks brilliant.
A famous version of this: a guy demonstrated you can "predict" the S&P 500 using the butter production in Bangladesh. Not because butter actually predicts stocks — because if you try enough random series, one of them will line up.
3. Survivorship bias
You test your strategy on today's S&P 500 list. But that list excludes every company that went bankrupt or was delisted over the last 20 years. So your backtest only ever sees the winners.
A "buy and hold forever" strategy looks fantastic on today's S&P 500 because Lehman Brothers isn't on the list anymore. It would look much worse on the actual list of companies that existed in 2007.
What walk-forward actually does
Walk-forward validation is mostly about discipline.
You split your data into chunks. Say each chunk is 12 months. You train the model on chunk 1 only, then test it on chunk 2 — without ever letting the model see chunk 2 during training. Then you re-train on chunks 1 + 2 and test on chunk 3. And so on.
At each step, the model only knows what would have actually been known on that date.
It sounds boring. It is boring. It's also the only way to get backtest numbers that have any chance of holding up in real trading.
Walk-forward typically cuts retail "70%/year" claims down to something between 0% and 12% per year — usually with much bigger drawdowns than the original number suggests. That's the honest version.
What we report
Trading Agent runs walk-forward on every model we ship. The "DirAcc" stat you see on each prediction (directional accuracy across the cross-validation windows) is the walk-forward number, not the in-sample number.
A few things that fall out of this:
- Most of our DirAcc values land between 52% and 65% — better than coinflip on a directional call, but nowhere near "AI will make you rich" territory.
- Some tickers come back with DirAcc around 48–50% on certain horizons. We still publish those — because a model that can't tell the direction is itself important information.
- We don't backfill. The "Live log" at /predictions is the same forecasts we made in real time. You can verify them against your own broker history.
Why this matters for you
If somebody is selling you a system and the only number they show is a backtest, ask one question: what test method?
If they can't explain it in two sentences, or the answer is "we used the last 5 years", run.
If they say "walk-forward, with these holdout chunks, and here are the per-window stats", you can take the number more seriously — but you should still treat it as a soft prior, not a guarantee. Markets change. Models decay. Even an honest backtest is a guide, not a promise.
We try to be in the second camp. Some of our forecasts will still be wrong. We say so on every page.
This article is educational content. Nothing here is financial advice. See our Methodology page for technical detail on how Trading Agent's models are built, validated, and shipped.

