Model Architecture

Strategy Overview

How our LSTM + Meta-Label pipeline makes daily Nikkei 225 predictions

What This System Does

Every trading day, before the Tokyo market opens, the system takes the expected open price and produces a directional signal — BUY, SELL, or HOLD — together with a confidence-weighted position size. The position is entered at the open and closed at the same day's close, so there is no overnight risk.

1Collect daily OHLCV data for 17 global indices, 6 commodities, and 3 side tickers
2Engineer 500+ features: momentum, volatility, entropy, structural breaks
3Compress via PCA, then feed into LSTM neural networks
4Meta-Label filter decides whether to trade and at what size

Key Innovations

Fractional Differentiation (FFD)

Traditional approaches either use raw prices (non-stationary, causes spurious regressions) or simple returns (stationary, but destroys long-term memory). FFD applies a fractional difference that finds the minimum differentiation needed for stationarity while preserving as much price memory as possible. For each ticker, an optimal d value is determined so the series passes the ADF stationarity test.

Triple-Barrier Labeling

Instead of using simple up/down labels, the system applies the Triple-Barrier method (Prado, 2018). Each trading day is labeled +1 (profit above threshold), −1 (loss beyond threshold), or 0 (within threshold). The threshold adapts dynamically based on recent volatility, so labels reflect meaningful moves rather than noise.

Dual-LSTM Architecture

The system uses two LSTM (Long Short-Term Memory) networks in sequence. The first Stacking LSTM learns compressed temporal patterns from 60-day windows of PCA-reduced features. The second Primary LSTM combines those patterns with the original features and produces a 3-class prediction (sell / hold / buy) along with hidden representations used by the final meta model.

Meta-Labeling

A Random Forest meta-model acts as a gatekeeper. It receives the Primary LSTM's hidden state, predicted probabilities, direction, and rolling hit rate, then decides whether the primary signal is trustworthy enough to trade. This two-stage design greatly reduces false signals even when the primary model is moderately accurate.

Probability-Based Bet Sizing

Position size is not fixed. The meta-model's confidence probability is converted to a position fraction via a CDF transform, then discretised to steps of 0.2 (0%, 20%, 40%, … 100%) to prevent jittery allocation changes. Higher confidence results in a larger position; low confidence means a smaller or zero position.

Risk Management

The system incorporates multiple risk gates that can automatically reduce or halt trading:

Data Sources

Category Tickers
TargetNikkei 225 (^N225)
Global Indices^DJI, ^IXIC, ^GSPC, ^RUT, ^GDAXI, ^FTSE, ^HSI, ^KS11
CommoditiesGC=F (Gold), SI=F (Silver), CL=F (Oil), NG=F (Gas), HG=F (Copper), ZC=F (Corn)
Volatility / Rates^VIX, ^TNX (10Y), ^TYX (30Y)

Theoretical Foundation

The strategy design follows methodologies from:

Key concepts: Fractional Differentiation (Ch. 5), Triple-Barrier Labels (Ch. 3), Meta-Labeling (Ch. 3.6), Purged K-Fold CV (Ch. 7), Bet Sizing (Ch. 10), Feature Importance (Ch. 8), Structural Breaks (Ch. 17), Entropy Features (Ch. 18).