The spread between two assets isn't just price_A - price_B. You need a hedge ratio (Ξ²) that accounts for the scaling relationship:
spread = price_A - Ξ² Γ price_B
Where Ξ² comes from either:
- OLS regression:
price_A = Ξ± + Ξ² Γ price_B + Ξ΅β simple but assumes a static relationship - Kalman filter: dynamically updates Ξ² as the relationship evolves β handles structural shifts
- Total Least Squares (TLS): accounts for noise in both variables β more statistically sound than OLS
Two series are cointegrated if they wander around individually but their linear combination is stationary (mean-reverting). This is different from correlation:
- High correlation, no cointegration: Two stocks that trend up together but can drift apart permanently (e.g., AAPL and MSFT)
- Low correlation, cointegrated: Two assets that move differently day-to-day but always snap back to a fixed spread (e.g., spot gold and gold futures)
Cointegration is what you need for pairs trading. Correlation is not enough.
Engle-Granger Two-Step:
- Regress price_A on price_B to get residuals
- Test residuals for stationarity (ADF test)
- If ADF p-value < 0.05, the pair is cointegrated
- Pros: simple, intuitive
- Cons: sensitive to which variable you put on the left side
Johansen Test:
- Tests for cointegration in a VAR framework
- Can test multiple assets simultaneously (baskets, not just pairs)
- Returns the number of cointegrating relationships and their vectors
- Pros: handles multiple assets, direction-independent
- Cons: more complex, assumes linear relationships
Once you have a stationary spread, normalize it to a z-score:
z = (spread - mean(spread)) / std(spread)
Trading rules:
| Signal | Z-Score | Action | ||
|---|---|---|---|---|
| Entry long spread | z < -2.0 | Buy A, sell B | ||
| Entry short spread | z > +2.0 | Sell A, buy B | ||
| Exit | z crosses 0 | Close position | ||
| Stop loss | \ | z\ | > 3.5 | Close β relationship may be broken |
The stop loss is critical. When z hits extreme values, the cointegration relationship may have structurally broken. Common causes: M&A, sector rotation, regulatory change. You need to know when to abandon the trade, not just when to enter.
The entire strategy depends on mean reversion being real and persistent. Test this by:
- Half-life of mean reversion: How long does it take the spread to revert halfway to the mean? If it's > 60 days, the trade ties up capital too long.
half_life = -log(2) / log(1 + ΞΈ) # ΞΈ from AR(1) on the spread
- Hurst exponent: H < 0.5 = mean-reverting, H = 0.5 = random walk, H > 0.5 = trending. You want H well below 0.5.
| Pair | Rationale |
|---|---|
| GLD / GDX | Gold price vs. gold miners β mining costs create a mean-reverting spread |
| XLE / CL (crude) | Energy stocks vs. underlying commodity |
| KO / PEP | Same-sector competitors with similar business models |
| SPY / IVV | Near-identical ETFs tracking the same index |
| EWA / EWC | Australia and Canada β commodity-driven economies with correlated macro |