Hidden Regime Pipeline

Oct 16, 2025 · 16 min read · hidden-regime hmm tutorial ·

Share on:

Hidden Regime Pipeline

From Manual HMM to Production-Ready Pipeline

⚠️ EDUCATIONAL CONTENT ONLY - NOT FINANCIAL ADVICE This content is for educational purposes only. It does not constitute financial, investment, or trading advice. Always do your own research and consult with qualified professionals before making any financial decisions. Past performance does not guarantee future results.

See the Notebook source on GitHub.

Your Learning Journey So Far

Notebook 1: Why Use Log Returns?

You learned WHY we use log returns:

✓ Stationarity (required for HMMs)
✓ Time additivity (returns simply add)
✓ Scale invariance (compare $60 and $500 stocks)
✓ Symmetric treatment (gains/losses balanced)
✓ Cross-asset comparability

Key Takeaway: Log returns aren’t convention—they’re mathematical necessity.

Notebook 2: HMM Basics

You learned HOW HMMs work:

✓ Hidden states represent market regimes
✓ Transition matrices capture regime persistence
✓ Emission parameters (μ, σ) define regime characteristics
✓ Viterbi algorithm finds most likely state sequence
✓ Forward-backward gives probability distributions

Key Takeaway: Don’t force “Bear/Bull” labels—let data reveal regimes.

Notebook 3: Production Pipeline (This Notebook)

You’ll learn USING the pipeline for real analysis:

✓ One-line pipeline setup (production efficiency)
✓ Automated regime labeling (threshold-based logic)
✓ Regime characteristics and persistence
✓ Choosing optimal number of states
✓ Best practices for regime detection

Scope Note: This notebook focuses on regime detection. For trading applications (risk management, position sizing, backtesting), see Notebook 4.

What This Notebook Covers

Part 1: Quick Pipeline Setup (Sections 1-2)

Part 2: Understanding Regime Classification (Sections 3-4)

Part 3: Regime Duration & Persistence (Section 5)

Part 4: Visualization (Section 6)

Part 5: Model Selection (Section 7)

Part 6: Best Practices & Next Steps (Section 8)

 1import sys
 2from pathlib import Path
 3import numpy as np
 4import pandas as pd
 5import matplotlib.pyplot as plt
 6import seaborn as sns
 7from datetime import datetime, timedelta
 8import warnings
 9warnings.filterwarnings('ignore')
10
11# Add parent to path
12sys.path.insert(0, str(Path().absolute().parent))
13
14# Import hidden_regime using the full API
15import hidden_regime as hr
16
17# Plotting
18plt.style.use('seaborn-v0_8-whitegrid')
19sns.set_palette(sns.color_palette())
20
21print("Imports complete")
22print(f"hidden-regime version: {hr.__version__}")

Imports complete
hidden-regime version: 1.0.0

1. Quick Start: Pipeline Factory

The library provides factory functions for quick pipeline creation.

 1# Create a complete financial pipeline with one function call
 2ticker = 'NVDA'
 3n_states = 3
 4
 5pipeline = hr.create_financial_pipeline(
 6    ticker=ticker,
 7    n_states=n_states,
 8    include_report=False  # We'll do custom analysis
 9)
10
11# Display pipeline components
12print(f"Pipeline created for {ticker} with {n_states} states")
13print(f"  Data loader: {type(pipeline.data).__name__}")
14print(f"  Observations: {type(pipeline.observation).__name__}")
15print(f"  Model: {type(pipeline.model).__name__}")
16print(f"  Analysis: {type(pipeline.analysis).__name__}")

Pipeline created for NVDA with 3 states
  Data loader: FinancialDataLoader
  Observations: FinancialObservationGenerator
  Model: HiddenMarkovModel
  Analysis: FinancialAnalysis

2. Run the Complete Pipeline

The pipeline.update() method executes all stages:

Data loading: Downloads historical price data
Observation generation: Computes log returns
Model training: Fits HMM using Baum-Welch algorithm
Analysis: Generates regime classifications and metrics

1# Execute pipeline
2report = pipeline.update()
3
4# Get analysis results
5result = pipeline.component_outputs['analysis']
6
7# Display summary
8print(f"Analysis complete: {result.shape[0]} days, {result.shape[1]} metrics")
9print(f"Date range: {result.index[0].date()} to {result.index[-1].date()}")

Training on 500 observations (removed 0 NaN values)
Analysis complete: 500 days, 37 metrics
Date range: 2023-10-17 to 2025-10-14

3. Threshold-Based Regime Mapping

How Regimes Are Classified

The library uses data-driven thresholds on emission means $\mu_k$:

$ \text{regime}_k = \begin{cases} \text{Bear} & \text{if } \mu_k < \theta_{\text{bear}} \\ \text{Bull} & \text{if } \mu_k > \theta_{\text{bull}} \\ \text{Sideways} & \text{otherwise} \end{cases} $

Where $\theta_{\text{bear}}$ and $\theta_{\text{bull}}$ are determined by:

Historical return distributions
Volatility-adjusted cutoffs
Statistical significance tests

Key Point: We do NOT sort states by index and assign “State 0 = Bear, State 1 = Sideways, State 2 = Bull”. The data reveals the regime types.

  1# Get raw data with log returns
  2raw_data = pipeline.data.get_all_data()
  3
  4# Find log_return column (case-insensitive)
  5log_return_col = None
  6for col in raw_data.columns:
  7    if col.lower() == 'log_return':
  8        log_return_col = col
  9        break
 10
 11# Prepare regime data for visualization
 12regimes = result['regime_name'].unique()
 13regime_stats = []
 14
 15for regime in sorted(regimes):
 16    regime_data = result[result['regime_name'] == regime]
 17    n_days = len(regime_data)
 18    pct_time = n_days / len(result) * 100
 19    state_num = regime_data['predicted_state'].iloc[0]
 20    
 21    stats = {
 22        'regime': regime,
 23        'state': state_num,
 24        'days': n_days,
 25        'pct_time': pct_time
 26    }
 27    
 28    # Get log returns for this regime by aligning indices
 29    if log_return_col is not None:
 30        regime_indices = regime_data.index
 31        obs_returns = raw_data.loc[regime_indices, log_return_col].dropna()
 32        
 33        if len(obs_returns) > 0:
 34            avg_log = obs_returns.mean()
 35            avg_pct = hr.log_return_to_percent_change(avg_log) * 100
 36            vol_daily = obs_returns.std() * 100
 37            vol_annual = vol_daily * np.sqrt(252)
 38            
 39            stats.update({
 40                'daily_return': avg_pct,
 41                'daily_vol': vol_daily,
 42                'annual_return': avg_pct * 252,
 43                'annual_vol': vol_annual
 44            })
 45    
 46    regime_stats.append(stats)
 47
 48# Create visualization with larger figure size
 49fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))
 50
 51# Get regime colors
 52unique_regimes = sorted(result['regime_name'].unique())
 53from hidden_regime.visualization import get_regime_colors
 54color_list = get_regime_colors(len(unique_regimes), color_scheme="colorblind_safe")
 55regime_colors = dict(zip(unique_regimes, color_list))
 56
 57# Left: Pie chart of regime distribution
 58sizes = [s['pct_time'] for s in regime_stats]
 59labels = [f"{s['regime']}\n{s['pct_time']:.1f}%" for s in regime_stats]
 60colors = [regime_colors[s['regime']] for s in regime_stats]
 61
 62wedges, texts, autotexts = ax1.pie(sizes, labels=labels, colors=colors, autopct='',
 63                                     startangle=90, textprops={'fontsize': 13, 'fontweight': 'bold'})
 64ax1.set_title('Regime Distribution\n(% of Time)', fontsize=15, fontweight='bold', pad=20)
 65
 66# Right: Bar chart of returns by regime
 67regime_names = [s['regime'] for s in regime_stats]
 68returns = [s.get('annual_return', 0) for s in regime_stats]
 69bar_colors = [regime_colors[s['regime']] for s in regime_stats]
 70
 71bars = ax2.barh(regime_names, returns, color=bar_colors, alpha=0.7, edgecolor='black', linewidth=1.5)
 72ax2.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
 73ax2.set_xlabel('Annualized Return (%)', fontsize=13)
 74ax2.set_ylabel('Regime', fontsize=13)
 75ax2.set_title('Expected Returns by Regime', fontsize=15, fontweight='bold', pad=20)
 76ax2.grid(True, alpha=0.3, axis='x')
 77ax2.tick_params(axis='both', labelsize=12)
 78
 79# Add value labels on bars
 80for i, (bar, val) in enumerate(zip(bars, returns)):
 81    if val != 0:  # Only add label if we have data
 82        label_x = val + (3 if val > 0 else -3)
 83        ha = 'left' if val > 0 else 'right'
 84        ax2.text(label_x, bar.get_y() + bar.get_height()/2, 
 85                 f'{val:+.1f}%', ha=ha, va='center', fontweight='bold', fontsize=12)
 86
 87plt.tight_layout()
 88plt.show()
 89
 90# Print detailed statistics
 91print("\nDetailed Regime Characteristics")
 92print("=" * 80)
 93for s in regime_stats:
 94    print(f"\n{s['regime']} (State {s['state']})")
 95    print("-" * 80)
 96    print(f"  Frequency:   {s['days']} days ({s['pct_time']:.1f}% of period)")
 97    if 'daily_return' in s:
 98        print(f"  Daily:       mu = {s['daily_return']:+.3f}%  sigma = {s['daily_vol']:.3f}%")
 99        print(f"  Annualized:  mu = {s['annual_return']:+.1f}%  sigma = {s['annual_vol']:.1f}%")
100print("=" * 80)

Detailed Regime Characteristics
================================================================================

Bear (State 0)
--------------------------------------------------------------------------------
  Frequency:   108 days (21.6% of period)
  Daily:       mu = -1.027%  sigma = 4.131%
  Annualized:  mu = -258.9%  sigma = 65.6%

Bull (State 2)
--------------------------------------------------------------------------------
  Frequency:   104 days (20.8% of period)
  Daily:       mu = +2.739%  sigma = 1.221%
  Annualized:  mu = +690.1%  sigma = 19.4%

Sideways (State 1)
--------------------------------------------------------------------------------
  Frequency:   288 days (57.6% of period)
  Daily:       mu = -0.107%  sigma = 1.356%
  Annualized:  mu = -26.9%  sigma = 21.5%
================================================================================

4. Current Regime Status

State Probability Distribution

The HMM provides full probability distribution at each time point:

$ P(\text{state}_k \mid O_1, \ldots, O_T) = \frac{\alpha_T(k) \beta_T(k)}{\sum_{j} \alpha_T(j) \beta_T(j)} $

Where:

$\alpha_t(k)$: Forward probability (probability of observations up to $t$ AND being in state $k$)
$\beta_t(k)$: Backward probability (probability of future observations GIVEN state $k$ at $t$)

Confidence is the maximum probability: $\max_k P(\text{state}_k \mid \text{all data})$

 1# Analyze current regime
 2current = result.iloc[-1]
 3current_regime = current['regime_name']
 4
 5# Count consecutive days in regime
 6days_in_regime = 1
 7for i in range(len(result)-2, -1, -1):
 8    if result.iloc[i]['regime_name'] == current_regime:
 9        days_in_regime += 1
10    else:
11        break
12
13# Get state probabilities
14state_probs = []
15state_names = []
16for state in range(n_states):
17    col_name = f'state_{state}_prob'
18    if col_name in result.columns:
19        prob = current[col_name]
20        # Find regime name for this state
21        state_regime = result[result['predicted_state'] == state]['regime_name'].iloc[0] \
22                      if len(result[result['predicted_state'] == state]) > 0 else f"State {state}"
23        state_probs.append(prob)
24        state_names.append(state_regime)
25
26# Create bar chart
27fig, ax = plt.subplots(figsize=(12, 5))
28
29# Get colors for bars
30unique_regimes = sorted(result['regime_name'].unique())
31from hidden_regime.visualization import get_regime_colors
32color_list = get_regime_colors(len(unique_regimes), color_scheme="colorblind_safe")
33regime_colors = dict(zip(unique_regimes, color_list))
34bar_colors = [regime_colors.get(name, 'gray') for name in state_names]
35
36# Create horizontal bar chart
37bars = ax.barh(state_names, state_probs, color=bar_colors, alpha=0.7, edgecolor='black', linewidth=2)
38
39# Highlight the current regime (max probability)
40max_idx = np.argmax(state_probs)
41bars[max_idx].set_alpha(1.0)
42bars[max_idx].set_linewidth(3)
43
44# Add percentage labels
45for i, (bar, prob) in enumerate(zip(bars, state_probs)):
46    ax.text(prob + 0.02, bar.get_y() + bar.get_height()/2, 
47            f'{prob*100:.1f}%', ha='left', va='center', fontweight='bold', fontsize=11)
48
49# Formatting
50ax.set_xlim(0, 1.1)
51ax.set_xlabel('Probability', fontsize=12)
52ax.set_title(f'Current Market Regime: {current_regime} ' +
53             f'(Confidence: {current["confidence"]*100:.1f}%)', 
54             fontsize=14, fontweight='bold', pad=15)
55ax.grid(True, alpha=0.3, axis='x')
56
57# Add vertical line at confidence threshold
58ax.axvline(x=0.8, color='green', linestyle='--', alpha=0.5, linewidth=2, label='High confidence (>80%)')
59ax.legend(loc='lower right')
60
61plt.tight_layout()
62plt.show()
63
64# Print summary
65print(f"\nCurrent Market Status")
66print("=" * 70)
67print(f"  Date:       {result.index[-1].date()}")
68print(f"  Regime:     {current_regime}")
69print(f"  Confidence: {current['confidence']*100:.1f}%")
70print(f"  Duration:   {days_in_regime} days in current regime")
71print("=" * 70)

Current Market Status
======================================================================
  Date:       2025-10-14
  Regime:     Sideways
  Confidence: 68.0%
  Duration:   24 days in current regime
======================================================================

5. Regime Duration Analysis

Expected Regime Duration

From the HMM transition matrix $\mathbf{A}$, the expected duration in regime $k$ is:

$ E[\tau_k] = \frac{1}{1 - A_{kk}} $

Where $A_{kk} = P(\text{state}_{t+1} = k \mid \text{state}_t = k)$ is the probability of staying in the same regime.

Example: If $A_{kk} = 0.90$, then $E[\tau] = \frac{1}{1-0.90} = 10$ days.

Why Duration Matters

Short regimes (1-3 days): Possible overfitting or high-frequency market noise
Medium regimes (5-15 days): Typical market regime persistence
Long regimes (20+ days): Strong, stable market conditions

Duration analysis validates that the HMM is finding real market patterns, not random noise.

 1# Calculate regime durations
 2def calculate_regime_durations(df):
 3    """Calculate duration of each regime period"""
 4    durations = []
 5    current_regime = df.iloc[0]['regime_name']
 6    current_duration = 1
 7    
 8    for i in range(1, len(df)):
 9        if df.iloc[i]['regime_name'] == current_regime:
10            current_duration += 1
11        else:
12            durations.append({
13                'regime': current_regime,
14                'duration': current_duration,
15                'start': df.index[i-current_duration],
16                'end': df.index[i-1]
17            })
18            current_regime = df.iloc[i]['regime_name']
19            current_duration = 1
20    
21    # Add final period
22    durations.append({
23        'regime': current_regime,
24        'duration': current_duration,
25        'start': df.index[len(df)-current_duration],
26        'end': df.index[-1]
27    })
28    
29    return pd.DataFrame(durations)
30
31durations_df = calculate_regime_durations(result)
32
33# Display statistics
34print("Regime Duration Statistics")
35print("=" * 70)
36print(f"Total regime transitions: {len(durations_df)}\n")
37
38for regime in sorted(durations_df['regime'].unique()):
39    regime_durs = durations_df[durations_df['regime'] == regime]['duration']
40    
41    print(f"{regime}:")
42    print(f"  Occurrences: {len(regime_durs)} periods")
43    print(f"  Average:     {regime_durs.mean():.1f} days  (median: {regime_durs.median():.0f})")
44    print(f"  Range:       {regime_durs.min()} - {regime_durs.max()} days")
45    print()
46
47# Show longest periods
48print("Longest Regime Periods:")
49print("-" * 70)
50longest = durations_df.nlargest(5, 'duration')
51for idx, row in longest.iterrows():
52    print(f"  {row['regime']:<12} {row['duration']:>3} days  " +
53          f"({row['start'].date()} → {row['end'].date()})")
54print("=" * 70)

Regime Duration Statistics
======================================================================
Total regime transitions: 75

Bear:
  Occurrences: 11 periods
  Average:     9.8 days  (median: 6)
  Range:       1 - 40 days

Bull:
  Occurrences: 32 periods
  Average:     3.2 days  (median: 3)
  Range:       1 - 10 days

Sideways:
  Occurrences: 32 periods
  Average:     9.0 days  (median: 5)
  Range:       1 - 39 days

Longest Regime Periods:
----------------------------------------------------------------------
  Bear          40 days  (2025-02-24 → 2025-04-21)
  Sideways      39 days  (2025-07-16 → 2025-09-09)
  Sideways      34 days  (2023-11-15 → 2024-01-04)
  Sideways      24 days  (2025-09-11 → 2025-10-14)
  Sideways      22 days  (2024-11-20 → 2024-12-20)
======================================================================

6. Visualization

Three-panel visualization showing:

Price with regime shading: Visual alignment of regimes with price action
Confidence levels: How certain the model is about each regime
Regime timeline: Sequential regime transitions

 1# Create regime visualization
 2fig, axes = plt.subplots(3, 1, figsize=(16, 12), sharex=True)
 3
 4# Get raw price data
 5raw_data = pipeline.data.get_all_data()
 6close_col = next((col for col in raw_data.columns if col.lower() == 'close'), None)
 7
 8if close_col is None:
 9    raise ValueError("Could not find 'close' column in data")
10
11# Get regime colors
12unique_regimes = sorted(result['regime_name'].unique())
13from hidden_regime.visualization import get_regime_colors
14color_list = get_regime_colors(len(unique_regimes), color_scheme="colorblind_safe")
15regime_colors = dict(zip(unique_regimes, color_list))
16
17# Panel 1: Price with regime shading
18ax = axes[0]
19ax.plot(raw_data.index, raw_data[close_col], linewidth=1.5, color='black', zorder=2)
20ax.set_ylabel('Price ($)', fontsize=12)
21ax.set_title(f'{ticker} Price with Detected Regimes', fontsize=14, fontweight='bold')
22ax.grid(True, alpha=0.3)
23
24# Shade background by regime
25for i in range(len(result)):
26    regime = result.iloc[i]['regime_name']
27    color = regime_colors.get(regime, 'gray')
28    ax.axvspan(result.index[i], result.index[min(i+1, len(result)-1)], 
29               alpha=0.2, color=color, zorder=1)
30
31# Add legend
32from matplotlib.patches import Patch
33legend_elements = [Patch(facecolor=regime_colors.get(r, 'gray'), alpha=0.3, label=r) 
34                   for r in unique_regimes]
35ax.legend(handles=legend_elements, loc='upper left')
36
37# Panel 2: Confidence levels
38ax = axes[1]
39confidence_pct = result['confidence'] * 100
40
41ax.plot(result.index, confidence_pct, linewidth=1.5, color='blue', label='Confidence')
42ax.axhline(y=80, color='green', linestyle='--', alpha=0.5, label='High (>80%)')
43ax.axhline(y=60, color='orange', linestyle='--', alpha=0.5, label='Medium')
44
45# Shade confidence regions
46ax.fill_between(result.index, 0, confidence_pct, 
47                where=(confidence_pct > 80), alpha=0.2, color='green')
48ax.fill_between(result.index, 0, confidence_pct, 
49                where=(confidence_pct < 60), alpha=0.2, color='red')
50
51ax.set_ylabel('Confidence (%)', fontsize=12)
52ax.set_title('Regime Detection Confidence', fontsize=14, fontweight='bold')
53ax.grid(True, alpha=0.3)
54ax.legend(loc='lower left')
55ax.set_ylim(0, 105)
56
57# Panel 3: Regime timeline
58ax = axes[2]
59for i in range(len(result)):
60    regime = result.iloc[i]['regime_name']
61    color = regime_colors.get(regime, 'gray')
62    ax.bar(result.index[i], 1, width=1, color=color, edgecolor='none', alpha=0.7)
63
64ax.set_ylabel('Regime', fontsize=12)
65ax.set_xlabel('Date', fontsize=12)
66ax.set_title('Regime Sequence Timeline', fontsize=14, fontweight='bold')
67ax.set_ylim(0, 1.2)
68ax.set_yticks([])
69ax.grid(True, alpha=0.3, axis='x')
70
71plt.tight_layout()
72plt.show()

7. Model Comparison

Selecting Optimal Number of States

Model complexity tradeoff:

Too few states: Cannot capture market nuances
Too many states: Overfitting, very short regimes

Validation metrics:

Average duration: Should be 5+ days for meaningful regimes
Confidence: Higher confidence indicates clearer regime separation
Number of parameters: $n_{\text{params}} = n^2 + 2n$ for $n$ states
Interpretability: Can you explain what each regime represents?

 1# Compare models with different numbers of states
 2comparison_results = []
 3
 4print("Training models with 2-5 states...\n")
 5
 6for n in [2, 3, 4, 5]:
 7    print(f"  {n} states... ", end='', flush=True)
 8    
 9    test_pipeline = hr.create_financial_pipeline(
10        ticker=ticker,
11        n_states=n,
12        include_report=False
13    )
14    
15    _ = test_pipeline.update()
16    test_result = test_pipeline.component_outputs['analysis']
17    
18    # Calculate metrics
19    n_params = n**2 + 2*n  # Transitions + emissions
20    n_obs = len(test_result)
21    switches = (test_result['predicted_state'].diff() != 0).sum()
22    avg_duration = n_obs / switches if switches > 0 else n_obs
23    avg_conf = test_result['confidence'].mean() * 100
24    
25    comparison_results.append({
26        'States': n,
27        'Parameters': n_params,
28        'Switches': switches,
29        'Avg Duration': f"{avg_duration:.1f}",
30        'Avg Confidence': f"{avg_conf:.1f}%"
31    })
32    
33    print("done")
34
35# Display comparison table
36comp_df = pd.DataFrame(comparison_results)
37print("\n" + "=" * 70)
38print("Model Comparison Results")
39print("=" * 70)
40print(comp_df.to_string(index=False))
41print("=" * 70)

Training models with 2-5 states...

  2 states... Training on 500 observations (removed 0 NaN values)
done
  3 states... Training on 500 observations (removed 0 NaN values)
done
  4 states... Training on 500 observations (removed 0 NaN values)
done
  5 states... Training on 500 observations (removed 0 NaN values)
done

======================================================================
Model Comparison Results
======================================================================
 States  Parameters  Switches Avg Duration Avg Confidence
      2           8        24         20.8          89.2%
      3          15        75          6.7          79.5%
      4          24       101          5.0          78.8%
      5          35       114          4.4          78.7%
======================================================================

Selection Guidelines

Interpreting the results:

States	Typical Use Case	Watch Out For
2	Simple bull/bear classification	May miss sideways markets
3	Bear/Sideways/Bull (recommended)	Good balance
4	Adding crisis or rally states	Check duration > 5 days
5+	Very granular regimes	Often overfits, short regimes

Red flags:

⚠️ Average duration < 5 days → Likely overfitting
⚠️ Confidence < 60% → Poor regime separation
⚠️ Too many switches → Capturing noise, not signal

Recommended: Start with 3 states. Add more only if:

Durations remain meaningful (>5 days)
Each regime has clear interpretation
Validation on out-of-sample data shows improvement

7. Model Comparison

Compare different numbers of states to find the optimal model complexity.

8. Best Practices for Regime Detection

✓ DO These Things

1. Always Use Log Returns

Ensures stationarity (required for HMMs)
Proper statistical properties
Cross-asset comparability

2. Let Data Determine Regimes

Use threshold-based classification
Don’t force “Bear/Bull” labels by sorting state indices
Examine emission parameters before labeling

3. Check Confidence Levels

Low confidence → regime uncertainty or transitions
High confidence → reliable regime signals
Use confidence for position sizing

4. Validate Regime Duration

Very short regimes (< 3 days) → possible overfitting
Typical regimes (5-15 days) → reasonable persistence
Long regimes (20+ days) → strong, stable conditions

5. Compare Multiple Models

Try 2-5 states systematically
Balance fit quality vs complexity
Use validation metrics (duration, confidence)

6. Monitor Regime Changes

Large parameter shifts → market structure changes
Regime transition patterns → leading indicators
Confidence drops → early warning signals

⚠️ DON’T Do These Things

1. Don’t Use Raw Prices

Prices are non-stationary
Violates HMM assumptions
Produces invalid inferences

2. Don’t Force Regime Labels

State 0 ≠ automatically “Bear”
Let emission means determine regime types
Thresholds should be data-driven

3. Don’t Ignore Low Confidence

Low confidence carries information
Often signals regime transitions
Critical for risk management

4. Don’t Assume Stationarity

Market regimes evolve over time
Parameters change with market structure
Regular retraining may be needed

5. Don’t Overtrain

More states ≠ better model
5+ states often overfit
Focus on interpretability

6. Don’t Use Single Random Seed

HMMs have local optima
Results can vary by initialization
Consider multiple runs or robust initialization

Production Checklist

Before deploying for real trading decisions:

Verified log returns are used (not prices or simple returns)
Checked regime durations are reasonable (not too short)
Validated confidence levels are meaningful
Compared 2-5 states and selected optimal complexity
Examined regime characteristics match market intuition
Tested on out-of-sample data
Established monitoring for regime changes
Integrated with risk management system (see Notebook 4)

9. Conclusion & Next Steps

What You’ve Accomplished

Congratulations! You now have a complete understanding of regime detection using HMMs:

From Notebook 1 (Mathematical Foundation):

✓ Why log returns are mathematically necessary
✓ Stationarity, additivity, and scale invariance
✓ Properties required for statistical modeling

From Notebook 2 (HMM Mechanics):

✓ How HMMs represent market regimes
✓ What transition matrices and emissions mean
✓ Viterbi vs forward-backward algorithms
✓ Interpreting states without forcing labels

From Notebook 3 (Production Pipeline) - This Notebook:

✓ One-line pipeline setup
✓ Automated threshold-based regime classification
✓ Regime duration and persistence analysis
✓ Model selection (2-5 states comparison)
✓ Visualization and interpretation
✓ Best practices for production use

The Complete Learning Path

Notebook 1: Mathematical Foundation (WHY)
    ↓
Notebook 2: HMM Mechanics (HOW)
    ↓
Notebook 3: Production Pipeline (USING) ← You are here
    ↓
Notebook 4: Trading Applications (PROFITING) ← Next step

What’s Next: Notebook 4 - Advanced Trading Applications

You can now detect regimes. Notebook 4 teaches you how to USE them for trading:

Part A: Regime-Specific Risk Management

VaR and Expected Shortfall by regime
Maximum drawdown analysis
Sharpe ratios and risk-adjusted returns
Dynamic position sizing based on regime risk

Part B: Technical Indicator Integration

Combining HMM regimes with RSI, MACD, Bollinger Bands
Agreement/disagreement analysis
When to trust HMMs vs indicators
Ensemble signal generation

Part C: Portfolio Applications

Position signals (LONG/SHORT/NEUTRAL)
Confidence-weighted position sizing
Signal strength methodology
Risk-adjusted portfolio allocation

Part D: Backtesting & Validation

Hypothetical performance comparison
Walk-forward analysis
Regime-based strategy testing
Out-of-sample validation

Practical Next Steps

If you want to continue learning:

Open Notebook 4 to learn trading applications
Apply these techniques to your own tickers
Experiment with different time periods
Try different numbers of states

If you’re ready for production:

Review the best practices checklist above
Test on multiple assets and time periods
Validate regime stability over time
Integrate with your existing trading system
Start with paper trading before real capital

If you want to customize:

Modify threshold values for regime classification
Add your own features beyond log returns
Implement custom risk metrics
Integrate with other analysis tools

Key Takeaways

Regime detection is a tool, not a crystal ball.
Use it to:
Understand current market conditions
Adjust risk based on regime characteristics
Combine with other analysis methods
Make more informed trading decisions
Don’t expect it to:
Predict future price movements perfectly
Work in all market conditions
Replace fundamental analysis
Guarantee profitable trades

Resources

Documentation: Hidden Regime Docs
Examples: More examples in /examples directory
Issues: Report bugs at GitHub
Community: Share strategies and learn from others

Ready for trading applications? Open Notebook 4!