Strategy Overview — NI225 Signal Desk

What This System Does

Every trading day, before the Tokyo market opens, the system takes the expected open price and produces a directional signal — BUY, SELL, or HOLD — together with a confidence-weighted position size. The position is entered at the open and closed at the same day's close, so there is no overnight risk.

1Collect daily OHLCV data for 17 global indices, 6 commodities, and 3 side tickers

→

2Engineer 500+ features: momentum, volatility, entropy, structural breaks

→

3Compress via PCA, then feed into LSTM neural networks

→

4Meta-Label filter decides whether to trade and at what size

Key Innovations

Fractional Differentiation (FFD)

Traditional approaches either use raw prices (non-stationary, causes spurious regressions) or simple returns (stationary, but destroys long-term memory). FFD applies a fractional difference that finds the minimum differentiation needed for stationarity while preserving as much price memory as possible. For each ticker, an optimal d value is determined so the series passes the ADF stationarity test.

Triple-Barrier Labeling

Instead of using simple up/down labels, the system applies the Triple-Barrier method (Prado, 2018). Each trading day is labeled +1 (profit above threshold), −1 (loss beyond threshold), or 0 (within threshold). The threshold adapts dynamically based on recent volatility, so labels reflect meaningful moves rather than noise.

Dual-LSTM Architecture

The system uses two LSTM (Long Short-Term Memory) networks in sequence. The first Stacking LSTM learns compressed temporal patterns from 60-day windows of PCA-reduced features. The second Primary LSTM combines those patterns with the original features and produces a 3-class prediction (sell / hold / buy) along with hidden representations used by the final meta model.

Meta-Labeling

A Random Forest meta-model acts as a gatekeeper. It receives the Primary LSTM's hidden state, predicted probabilities, direction, and rolling hit rate, then decides whether the primary signal is trustworthy enough to trade. This two-stage design greatly reduces false signals even when the primary model is moderately accurate.

Probability-Based Bet Sizing

Position size is not fixed. The meta-model's confidence probability is converted to a position fraction via a CDF transform, then discretised to steps of 0.2 (0%, 20%, 40%, … 100%) to prevent jittery allocation changes. Higher confidence results in a larger position; low confidence means a smaller or zero position.

Risk Management

The system incorporates multiple risk gates that can automatically reduce or halt trading:

Deflated Sharpe Ratio (DSR) — If the live DSR falls below the threshold, positions are halved to protect against deteriorating edge.
Max Drawdown — When drawdown exceeds the defined limit, trading is paused entirely.
Hit Rate — A rolling accuracy check ensures the model maintains a minimum win rate.
SADF Bubble Detection — If bubble regime indicators breach the 95th percentile, the system enters cautious mode.

Data Sources

Category	Tickers
Target	Nikkei 225 (^N225)
Global Indices	^DJI, ^IXIC, ^GSPC, ^RUT, ^GDAXI, ^FTSE, ^HSI, ^KS11
Commodities	GC=F (Gold), SI=F (Silver), CL=F (Oil), NG=F (Gas), HG=F (Copper), ZC=F (Corn)
Volatility / Rates	^VIX, ^TNX (10Y), ^TYX (30Y)

Theoretical Foundation

The strategy design follows methodologies from:

Advances in Financial Machine Learning — Marcos López de Prado (2018)
Causal Factor Investing — López de Prado (2023, Cambridge Elements)
Co-authored arXiv papers on causal forest and finance ML (2020–2026)

Key concepts: Fractional Differentiation (Ch. 5), Triple-Barrier Labels (Ch. 3), Meta-Labeling (Ch. 3.6), Purged K-Fold CV (Ch. 7), Bet Sizing (Ch. 10), Feature Importance (Ch. 8), Structural Breaks (Ch. 17), Entropy Features (Ch. 18).

NI225 Signal Desk — Strategy Overview