Methodology

How the prediction model works

No black box, no LLM, no hidden assumptions. Gradient-boosted trees on past-only technical features, validated walk-forward across 16 markets.

1. Is this an LLM?

No. We deliberately don't use large language models for the prediction itself.

LLMs hallucinate confidently, are non-deterministic (same input → different output), are expensive at scale, and have no theoretical basis for predicting price movement.

Instead we use XGBoost — a gradient-boosted decision tree ensemble — on roughly 30 technical indicators. The output is deterministic: same input always yields the same prediction. We can also show you which features drove each call (see top_features in the public prediction log).

2. What features does the model see?

All features are computed from past-only OHLCV data. There is no look-ahead leakage.

• Price-based: returns over multiple lookbacks (1d, 5d, 20d), z-scores, distance from moving averages • Volume: relative volume vs N-day median, on-balance volume • Volatility: ATR, realized volatility, Bollinger Band position • Momentum: RSI, MACD, rate-of-change • Microstructure: range/body ratios, gap features, intraday high-low spread • Calendar: day-of-week, day-of-month seasonality

The exact set is in backend/agents/statistical.py. We do not use fundamental data, analyst targets, or news scores in the prediction itself.

3. How is it validated?

Walk-forward TimeSeriesSplit cross-validation, not random K-fold. Random K-fold would shuffle the time order and inadvertently let the model train on future data — the #1 silent mistake in financial ML.

Specifically: we train on the oldest N% of data, predict the next chunk, then expand the training window forward. Every prediction is evaluated against data the model has never seen.

The directional accuracy (DirAcc) we surface — in the public prediction log and on every stock's analysis page — is this out-of-sample number, not a curve-fit training-set score.

4. How accurate is it really?

Live track record at /predictions — every call is logged with its expiry, then verified against the actual close once the horizon passes.

Realistic expectations: on our verified log, daily directional accuracy has run around 50-55% on liquid US large-caps — our deepest sample. It is lower, sometimes below a coin-flip, on several smaller and Asian markets and at longer horizons. We don't claim 80%+ accuracy — anyone claiming that is either overfit, looking at noise, or lying.

Even a 55% directional edge is genuinely useful when compounded over many independent bets — but it is not a money printer, and it is not uniform across markets. If we wanted to mislead you we'd show a single cherry-picked ticker. Instead we publish every prediction we've ever made — wins, losses, and the full per-market breakdown — at /predictions.

5. Where does it work? Where does it fail?

Works best so far: US large-caps (deep history, liquid) on daily horizons with stable volatility regimes — our verified US directional accuracy leads every other market. Weaker so far on several Asian markets, where verified accuracy has trailed coin-flip; we surface the full per-market breakdown at /predictions rather than bury it.

Fails predictably on: • Earnings-day gaps — technical features can't anticipate fundamentals • Halted or suspended stocks — no recent data to train on • First-day IPOs — no history at all • Black swan events (COVID-style shocks, war, regulatory shutdowns) • Very thin / illiquid micro-caps — noise dominates signal

The model’s verified accuracy is reported per market and per horizon at /predictions — including the markets and horizons where it underperforms a coin flip. We surface the full breakdown rather than bury it.

6. Why not LSTM or Transformers?

Three reasons.

First, tabular financial features are XGBoost's home turf. The deep-learning vs gradient-boosting gap on small tabular data is well-documented: typically 1-2% in either direction, with XGBoost winning more often than not.

Second, Render's free-tier 512MB RAM ceiling won't fit TensorFlow. We could pay for more, but our pricing claim is that you can self-host this without burning $500/month.

Third, determinism and auditability. Tree models let us show you which features drove a specific prediction. Neural networks are harder to explain in a way a user can verify.

7. How does news sentiment work?

We use a deterministic lexicon scorer based on the Loughran-McDonald financial dictionary, plus our own multi-word phrase list. Each headline is scored ±1 and aggregated.

We explicitly do not use LLM-based sentiment. LLMs give different scores for the same headline on different calls, occasionally mark obviously negative news as positive when wrapped in ambiguous language, and are slow and expensive at scale.

A lexicon-based score isn't perfect — but it's reproducible, and we can show you exactly which words triggered the score. News sentiment is a side-signal in the UI; it does not feed into the price prediction itself.

8. How often does the model retrain?

Every prediction is a freshly-trained model. On each /predict call we:

1. Pull the latest OHLCV for the ticker 2. Compute features 3. Train a fresh XGBoost with walk-forward CV 4. Predict the requested horizon

This means the model always sees the latest data. No stale weights from a deployment last month. The cost is that predictions are slower than serving a pre-trained model — but for the trade-off in freshness, we accept the latency.

9. What is the prediction format?

The model is trained to produce a point estimate and directional read for each horizon, but those forward per-stock outputs are not surfaced in the product. What you see is the technical indicators the model trains on and its verified historical accuracy at /predictions — every past call scored against the actual close.

The historical accuracy at /predictions is reported per market and per horizon, scored against the actual close. Treat any past number as evidence with explicit uncertainty, not a forward target.

10. Why should you trust us?

Don't take our word for it. Audit us.

Every prediction we have ever served is logged at /predictions, with the actual close filled in once the horizon passes. Per-ticker win rates are at /leaderboard. If our headline numbers don't add up, you will see it.

We can't hide stats that are computed from public log entries. That's the whole point.

AI-system transparency

Required by EU AI Act Art 13 / Art 50 and GDPR Art 13(2)(f)

Model class

Gradient-boosted decision trees (XGBoost) + a thin technical- analysis voting layer. No large language model is used at any stage of forecasting. Sentiment scoring on news headlines uses a lightweight finance-tuned classifier, not an LLM either.

Training data

Daily-bar OHLCV from yfinance (Yahoo Finance) for ~165 tickers across 16 markets, covering 2010-01-01 to the current trading day. We use only public, end-of-day market data; we do not train on user accounts, watchlists, or any personal information.

Features

Past-only technical indicators (RSI 14, MACD, SMA 20/50/200, ATR 14, return autocorrelations, volume z-scores) plus the target's own lagged returns. No fundamental data, no cross-sectional features.

Validation

Walk-forward cross-validation with an embargoed test window (model trained on data up to day t, evaluated ont+1 onwards; no future leakage). All accuracy figures shown on the site are out-of-sample direction-accuracy from this walk-forward scheme.

Retraining cadence

Models are retrained weekly via a scheduled GitHub Actions workflow that rolls the training window forward. Individual predictions on the dashboard are produced at request time using the most recent saved model.

Known limitations & biases

Accuracy is materially lower on low-volume emerging markets (especially Vietnam — .VN listings) where yfinance coverage is patchy and bid-ask spreads are wide.
Models trained on historical regimes may underperform during structural shocks(e.g. COVID-2020, rate-cycle inflexions). The walk-forward DirAcc reflects historical regimes, not future ones.
Long-horizon forecasts (1-month and beyond) are flagged low confidenceby design — technicals-only forecasts at that range tend toward random.
The model does not account for corporate actions, fundamental shocks, regulatory events, or news beyond what is already priced into the technical signal at training time.

No automated decisions

We do not produce automated decisions with legal or similarly significant effects within the meaning of GDPR Article 22. The user retains full discretion over every investment decision.

Logging & auditability

Every prediction produced by the seed cron and by the public demo widget is written to the public predictions_log with timestamp, predicted move, and (after the horizon expires) the actual realised move. The log is downloadable as CSV and offers ongoing third-party audit of our accuracy claims.

Audit it yourself

Every prediction, every actual close. Pick a ticker and check our work.

See live predictions log

Methodology

How the prediction model works

No black box, no LLM, no hidden assumptions. Gradient-boosted trees on past-only technical features, validated walk-forward across 16 markets.

1. Is this an LLM?

No. We deliberately don't use large language models for the prediction itself.

LLMs hallucinate confidently, are non-deterministic (same input → different output), are expensive at scale, and have no theoretical basis for predicting price movement.

2. What features does the model see?

All features are computed from past-only OHLCV data. There is no look-ahead leakage.

The exact set is in backend/agents/statistical.py. We do not use fundamental data, analyst targets, or news scores in the prediction itself.

3. How is it validated?

Specifically: we train on the oldest N% of data, predict the next chunk, then expand the training window forward. Every prediction is evaluated against data the model has never seen.

The directional accuracy (DirAcc) we surface — in the public prediction log and on every stock's analysis page — is this out-of-sample number, not a curve-fit training-set score.

4. How accurate is it really?

Live track record at /predictions — every call is logged with its expiry, then verified against the actual close once the horizon passes.

5. Where does it work? Where does it fail?

6. Why not LSTM or Transformers?

Three reasons.

Second, Render's free-tier 512MB RAM ceiling won't fit TensorFlow. We could pay for more, but our pricing claim is that you can self-host this without burning $500/month.

Third, determinism and auditability. Tree models let us show you which features drove a specific prediction. Neural networks are harder to explain in a way a user can verify.

7. How does news sentiment work?

We use a deterministic lexicon scorer based on the Loughran-McDonald financial dictionary, plus our own multi-word phrase list. Each headline is scored ±1 and aggregated.

8. How often does the model retrain?

Every prediction is a freshly-trained model. On each /predict call we:

1. Pull the latest OHLCV for the ticker 2. Compute features 3. Train a fresh XGBoost with walk-forward CV 4. Predict the requested horizon

9. What is the prediction format?

The historical accuracy at /predictions is reported per market and per horizon, scored against the actual close. Treat any past number as evidence with explicit uncertainty, not a forward target.

10. Why should you trust us?

Don't take our word for it. Audit us.

We can't hide stats that are computed from public log entries. That's the whole point.

AI-system transparency

Required by EU AI Act Art 13 / Art 50 and GDPR Art 13(2)(f)

Model class

Training data

Features

Past-only technical indicators (RSI 14, MACD, SMA 20/50/200, ATR 14, return autocorrelations, volume z-scores) plus the target's own lagged returns. No fundamental data, no cross-sectional features.

Validation

Retraining cadence

Known limitations & biases

Accuracy is materially lower on low-volume emerging markets (especially Vietnam — .VN listings) where yfinance coverage is patchy and bid-ask spreads are wide.
Models trained on historical regimes may underperform during structural shocks(e.g. COVID-2020, rate-cycle inflexions). The walk-forward DirAcc reflects historical regimes, not future ones.
Long-horizon forecasts (1-month and beyond) are flagged low confidenceby design — technicals-only forecasts at that range tend toward random.
The model does not account for corporate actions, fundamental shocks, regulatory events, or news beyond what is already priced into the technical signal at training time.

No automated decisions

We do not produce automated decisions with legal or similarly significant effects within the meaning of GDPR Article 22. The user retains full discretion over every investment decision.

Logging & auditability

Audit it yourself

Every prediction, every actual close. Pick a ticker and check our work.

See live predictions log