Methodology
How we predict stocks
No black box, no LLM, no hidden assumptions. Gradient-boosted trees on past-only technical features, validated walk-forward across 13 markets.
1. Is this an LLM?
No. We deliberately don't use large language models for the prediction itself.
LLMs hallucinate confidently, are non-deterministic (same input β different output), are expensive at scale, and have no theoretical basis for predicting price movement.
Instead we use XGBoost β a gradient-boosted decision tree ensemble β on roughly 30 technical indicators. The output is deterministic: same input always yields the same prediction. We can also show you which features drove each call (see top_features in every prediction card).
2. What features does the model see?
All features are computed from past-only OHLCV data. There is no look-ahead leakage.
β’ Price-based: returns over multiple lookbacks (1d, 5d, 20d), z-scores, distance from moving averages β’ Volume: relative volume vs N-day median, on-balance volume β’ Volatility: ATR, realized volatility, Bollinger Band position β’ Momentum: RSI, MACD, rate-of-change β’ Microstructure: range/body ratios, gap features, intraday high-low spread β’ Calendar: day-of-week, day-of-month seasonality
The exact set is in backend/agents/statistical.py. We do not use fundamental data, analyst targets, or news scores in the prediction itself.
3. How is it validated?
Walk-forward TimeSeriesSplit cross-validation, not random K-fold. Random K-fold would shuffle the time order and inadvertently let the model train on future data β the #1 silent mistake in financial ML.
Specifically: we train on the oldest N% of data, predict the next chunk, then expand the training window forward. Every prediction is evaluated against data the model has never seen.
The directional accuracy (DirAcc) we surface in every prediction card is this out-of-sample number β not a curve-fit training-set score.
4. How accurate is it really?
Live track record at /predictions β every call is logged with its expiry, then verified against the actual close once the horizon passes.
Realistic expectations: on our verified log, daily directional accuracy has run around 50-55% on liquid US large-caps β our deepest sample. It is lower, sometimes below a coin-flip, on several smaller and Asian markets and at longer horizons. We don't claim 80%+ accuracy β anyone claiming that is either overfit, looking at noise, or lying.
Even a 55% directional edge is genuinely useful when compounded over many independent bets β but it is not a money printer, and it is not uniform across markets. If we wanted to mislead you we'd show a single cherry-picked ticker. Instead we publish every prediction we've ever made β wins, losses, and the full per-market breakdown β at /predictions.
5. Where does it work? Where does it fail?
Works best so far: US large-caps (deep history, liquid) on daily horizons with stable volatility regimes β our verified US directional accuracy leads every other market. Weaker so far on several Asian markets, where verified accuracy has trailed coin-flip; we surface the full per-market breakdown at /predictions rather than bury it.
Fails predictably on: β’ Earnings-day gaps β technical features can't anticipate fundamentals β’ Halted or suspended stocks β no recent data to train on β’ First-day IPOs β no history at all β’ Black swan events (COVID-style shocks, war, regulatory shutdowns) β’ Very thin / illiquid micro-caps β noise dominates signal
Each prediction is tagged low / medium / high confidence. A low-confidence call is explicitly not a trade signal.
6. Why not LSTM or Transformers?
Three reasons.
First, tabular financial features are XGBoost's home turf. The deep-learning vs gradient-boosting gap on small tabular data is well-documented: typically 1-2% in either direction, with XGBoost winning more often than not.
Second, Render's free-tier 512MB RAM ceiling won't fit TensorFlow. We could pay for more, but our pricing claim is that you can self-host this without burning $500/month.
Third, determinism and auditability. Tree models let us show you which features drove a specific prediction. Neural networks are harder to explain in a way a user can verify.
7. How does news sentiment work?
We use a deterministic lexicon scorer based on the Loughran-McDonald financial dictionary, plus our own multi-word phrase list. Each headline is scored Β±1 and aggregated.
We explicitly do not use LLM-based sentiment. LLMs give different scores for the same headline on different calls, occasionally mark obviously negative news as positive when wrapped in ambiguous language, and are slow and expensive at scale.
A lexicon-based score isn't perfect β but it's reproducible, and we can show you exactly which words triggered the score. News sentiment is a side-signal in the UI; it does not feed into the price prediction itself.
8. How often does the model retrain?
Every prediction is a freshly-trained model. On each /predict call we:
1. Pull the latest OHLCV for the ticker 2. Compute features 3. Train a fresh XGBoost with walk-forward CV 4. Predict the requested horizon
This means the model always sees the latest data. No stale weights from a deployment last month. The cost is that predictions are slower than serving a pre-trained model β but for the trade-off in freshness, we accept the latency.
9. What is the prediction format?
A point estimate: "AAPL will close at $X in 7 days," with explicit confidence (low / medium / high) and a Bullish / Neutral / Bearish directional read.
The point estimate is rounded to the nearest cent β but the confidence interval is much wider than that. Treat the number as a directional hint with explicit uncertainty, not as a target price. The action label and confidence band are more informative than the exact dollar value.
10. Why should you trust us?
Don't take our word for it. Audit us.
Every prediction we have ever served is logged at /predictions, with the actual close filled in once the horizon passes. Per-ticker win rates are at /leaderboard. If our headline numbers don't add up, you will see it.
We can't hide stats that are computed from public log entries. That's the whole point.
AI-system transparency
Required by EU AI Act Art 13 / Art 50 and GDPR Art 13(2)(f)
- Model class
- Gradient-boosted decision trees (XGBoost) + a thin technical- analysis voting layer. No large language model is used at any stage of forecasting. Sentiment scoring on news headlines uses a lightweight finance-tuned classifier, not an LLM either.
- Training data
- Daily-bar OHLCV from yfinance (Yahoo Finance) for ~165 tickers across 13 markets, covering 2010-01-01 to the current trading day. We use only public, end-of-day market data; we do not train on user accounts, watchlists, or any personal information.
- Features
- Past-only technical indicators (RSI 14, MACD, SMA 20/50/200, ATR 14, return autocorrelations, volume z-scores) plus the target's own lagged returns. No fundamental data, no cross-sectional features.
- Validation
- Walk-forward cross-validation with an embargoed test window (model trained on data up to day t, evaluated ont+1 onwards; no future leakage). All accuracy figures shown on the site are out-of-sample direction-accuracy from this walk-forward scheme.
- Retraining cadence
- Models are retrained weekly via a scheduled GitHub Actions workflow that rolls the training window forward. Individual predictions on the dashboard are produced at request time using the most recent saved model.
- Known limitations & biases
- Accuracy is materially lower on low-volume emerging markets (especially Vietnam β
.VNlistings) where yfinance coverage is patchy and bid-ask spreads are wide. - Models trained on historical regimes may underperform during structural shocks(e.g.Β COVID-2020, rate-cycle inflexions). The walk-forward DirAcc reflects historical regimes, not future ones.
- Long-horizon forecasts (1-month and beyond) are flagged low confidenceby design β technicals-only forecasts at that range tend toward random.
- The model does not account for corporate actions, fundamental shocks, regulatory events, or news beyond what is already priced into the technical signal at training time.
- Accuracy is materially lower on low-volume emerging markets (especially Vietnam β
- No automated decisions
- We do not produce automated decisions with legal or similarly significant effects within the meaning of GDPR Article 22. The user retains full discretion over every investment decision.
- Logging & auditability
- Every prediction produced by the seed cron and by the public demo widget is written to the public
predictions_logwith timestamp, predicted move, and (after the horizon expires) the actual realised move. The log is downloadable as CSV and offers ongoing third-party audit of our accuracy claims.
Audit it yourself
Every prediction, every actual close. Pick a ticker and check our work.