HMM 101

HMM 101

An example of how we’ll be using our Hidden Markov Models in this project.

This notebook can be found on GitHub.

Imports and Setup

 1from datetime import datetime
 2import sys
 3import os
 4from typing import List, Tuple
 5import warnings
 6warnings.filterwarnings('ignore')
 7
 8import numpy as np
 9import matplotlib
10import matplotlib.pyplot as plt
11import pandas as pd
12from sklearn.metrics import accuracy_score, classification_report
13np.random.seed(42) # for repeatability

Hidden Regime Setup

 1# Add the project root to the path
 2sys.path.insert(0, os.path.join(os.path.dirname(os.getcwd()), '..'))
 3
 4# Import HR Model and Config
 5import hidden_regime as hr
 6from hidden_regime.config.model import HMMConfig
 7from hidden_regime.models.hmm import HiddenMarkovModel as HMM
 8
 9# Observations
10from hidden_regime.config.observation import ObservationConfig
11from hidden_regime.observations import BaseObservationGenerator

Problem Statement

We’re going to create a signal that belongs to one of N_STATES.

We’ll be creating a Gaussian Mixture where we randomly switch between the two states as our observed signal.

Note that the centers for the two states are fairly close together - this is because in most financial predictions we’ll be dealing with daily/hourly returns and they are often very small. The tooling in the HMM is designed around those limitations, so we’ll be keeping that in mind here as we build our observation set.

1NUM_SAMPLES = 200
2N_STATES = 2
3EXAMPLE_MEAN = 0.2
4EXAMPLE_STD = 0.05
5NAME = 'ExampleData'

Create our Example Observation Generator

This is fairly straightforward - we’re just using selecting our column (see NAME above) from the dataset. There is no special signal or feature generation here.

 1class ExampleObservationGenerator(BaseObservationGenerator):
 2    def update(self, data: pd.DataFrame) -> pd.DataFrame:
 3        # Transform data to observations
 4        observations = data.copy()
 5        observations[NAME] = self._compute_feature(data)
 6        return observations
 7
 8    def _compute_feature(self, data : pd.DataFrame) -> pd.DataFrame:
 9        """Compute our observation"""
10        return data[NAME]
11    
12    def plot(self, **kwargs):
13        # Visualization logic
14        pass

Add a Visualization Tool

Make it easy + repeatable to plot our states. If we want to plot the values on the same graph, let’s do that as well.

 1def plot_states(states : pd.Series, values : pd.Series = None, obs_mean : float = None, num_samples : int = None) -> List[matplotlib.axes.Axes]:
 2    """Convenience function for plotting our states"""
 3    ax = states.plot.line()
 4    ax.set_yticks(list(range(N_STATES))) 
 5    ax.set_yticklabels([f'State_{i}' for i in range(N_STATES)])
 6    ax.set_xlabel('Sample')
 7    
 8    if values is not None:
 9        # Plot our observations
10        ax2 = ax.twinx()
11        values.plot.line(ax=ax2, color='orange')
12        ax2.set_ylabel(values.name)
13        
14        # Add the state means to the plot to provide context
15        state_0_mean = pd.Series(np.array([-obs_mean]*num_samples), name='State_0_Mean')
16        state_1_mean = pd.Series(np.array([obs_mean]*num_samples), name='State_1_Mean')
17        state_0_mean.plot.line(ax=ax2, linestyle='--', color='r')
18        state_1_mean.plot.line(ax=ax2, linestyle='--', color='g')
19        
20    # Clarify plot information    
21    _ = plt.title(states.name)
22
23    if values is not None:
24        _ = plt.legend()       
25
26    return [ax] if values is None else [ax, ax2]

Generate our Observations (Gaussian Mixture)

Algorithm is simple:

  1. Decrement our counter. When the counter hits zero, it is time to randomly determine the state and figure out how long we’ll be in that state.
  2. Generate a sample based on the state. Each of these samples is Gaussian at the given center (mu) with a given standard deviation (std).
  3. Add the sample to our observation set.

What we should see in the data is clear breaks around each state where it is obvious that the observation at time t belongs to a state. If we shrink the centers or increase the standard deviations, the mixture will become much more difficult to seperate and the model will struggle.

 1def generate_observations(
 2    obs_mean : float, 
 3    obs_std : float, 
 4    num_samples : int, 
 5    obs_name : str) -> Tuple[pd.Series, pd.DataFrame]:    
 6    """Generate our observations and truth data"""
 7    
 8    observationConfig = ObservationConfig(generators=[ExampleObservationGenerator])
 9    observationGenerator = ExampleObservationGenerator(observationConfig)
10
11    obs = [] # Store our observations
12    actual_states = [] # Store our actual states
13    state = 0 # Our current state
14    count = 0 # How long we've been in a state
15    max_time_in_state = int(num_samples/4) # Bound how long we can possibly hang out in a state before drawing again
16
17    for i in range(num_samples):
18
19        # Counter hit zero, re-roll and figure out the state and how long we'll be there
20        if count == 0:
21            count = np.random.randint(low=2, high=max_time_in_state)
22            state = np.random.randint(low=0, high=2)
23    
24        count -= 1 # Decrement our counter
25    
26        # Get the parameters for our observation based on the state
27        if state == 0: 
28            mu = -obs_mean
29            std = obs_std
30        else:
31            mu = obs_mean
32            std = obs_std
33    
34        # Generate the sample at this timestep
35        sample = std * np.random.randn() + mu
36    
37        # Store data
38        obs.append(sample)
39        actual_states.append(state)
40
41    # Data generation
42    actual_states = pd.Series(actual_states, name='ActualStates')
43    observations = observationGenerator.update(pd.Series(obs, name=obs_name).to_frame())
44
45    return actual_states, observations
1actual_states, observations = generate_observations(
2    obs_mean=EXAMPLE_MEAN, 
3    obs_std=EXAMPLE_STD, 
4    obs_name=NAME, 
5    num_samples=NUM_SAMPLES)

Generate a Histogram of Our Data

We observe a clear separation between signals in this plot.

1_ = observations.plot.hist(bins=5)
2_ = plt.title('Historgram of Observed Data')

png

Plot States and Observations

Display our true states and the observed data we’ll be feeding to the model.

1_ = plot_states(actual_states, observations[NAME], obs_mean=EXAMPLE_MEAN, num_samples=NUM_SAMPLES)

png

Hidden Markov Model Setup

The hmm_cfg object is how we configure our Hidden Markov Model (HMM).

  • We specify the number of expected states
  • We specify what our observation name is (there may be many, select the one we care about)
  • We specify how to compute the model centers

Computing the model state information is performed in a data-driven approach. We could randomly assign states, but that is less than ideal. We could assume some knowledge of states and assign them manually, but that is cheating ;-P. So we let the data tell us what we should have.

Using kmeans lets us specify how many states we can expect and cluster the data around those states. Note that we aren’t specifying what the states are, only how many there are.

1hmm_cfg = HMMConfig(
2    n_states=N_STATES, 
3    observed_signal=NAME,
4    initialization_method='kmeans'
5)
6hmm = HMM(hmm_cfg)

Fit the model

Once we have our data and model, we need to fit it to the data. The Baum-Welch algorithm is used to train the model. It generates a maximum likelihood estimate of the model’s state at each time step (non-math terms - it picks what it thinks is the most reasonable state at the time given the data). The algorithm is recursive and we’ll limit the number of steps it can take. In our simple example, it should converge really quickly (i.e., less than 5 steps).

1hmm.fit(observations)
Training on 200 observations (removed 0 NaN values)

Visualize the Model

Now that we’ve fit the model, let’s understand what it thinks.

Transition Matrix tells us how likely state transitions are. Given how separated our data was, this should be very close to the identity matrix. When we repeat this with less separation in our data, this will become less distinct.

Emission Parameters tells us what the model expects each state to produce.

  • Mean is the average value we expect from an observation in a given state. It should be close to the EXAMPLE_MEAN value above.
  • Std Dev is the expected spread of an emission from a state (i.e., how a value varies from its mean). It should be close to the EXPECTED_STD value above.

As we noted above, the Training Convergence plot is empty - this is expected given that we converged to a solution so quickly.

1_ = hmm.plot()

png

Make Predictions!

Ok, so we fit the model to the data. Let’s understand what it thought was happening and compare it to what we know actually happened.

1y = hmm.predict(observations)

Prediction

As you can see, the model did pretty well! You would hope so, this is a very simple case. Let’s try again with something a bit more interesting.

1_ = plot_states(y.predicted_state, observations[NAME], obs_mean=EXAMPLE_MEAN, num_samples=NUM_SAMPLES)

png

1score = accuracy_score(actual_states, y.predicted_state)
2print(f"Accuracy score: {score:.3%}")
Accuracy score: 90.000%
1report = classification_report(actual_states, y.predicted_state)
2print("Classification Report:\n", report)
Classification Report:
               precision    recall  f1-score   support

           0       1.00      0.84      0.91       124
           1       0.79      1.00      0.88        76

    accuracy                           0.90       200
   macro avg       0.90      0.92      0.90       200
weighted avg       0.92      0.90      0.90       200

A More Complicated Problem

Let’s make the states less distinct and see how the model performs

1NEXT_EXAMPLE_MEAN = 0.05 # Closer centers
2NEXT_EXAMPLE_STD = 0.08 # Fatter distribution
1next_actual_states, next_observations = generate_observations(
2    obs_mean=NEXT_EXAMPLE_MEAN, 
3    obs_std=NEXT_EXAMPLE_STD, 
4    obs_name=NAME, 
5    num_samples=NUM_SAMPLES)

Updated Observations

There is much less separation between the two states in our mixture now. This will be much more challenging for our model.

1_ = next_observations.plot.hist(bins=5)
2_ = plt.title('Historgram of Observed Data')

png

Observations

You can see the observations are much less distinct in this picture. There is some slight separation but given the increased variance in the signals it is tough to differentiate visually between the two states.

1_ = plot_states(next_actual_states, next_observations[NAME], obs_mean=NEXT_EXAMPLE_MEAN, num_samples=NUM_SAMPLES)

png

Model Update

Re-train our model on our latest observation set.

1hmm.fit(next_observations)
Training on 200 observations (removed 0 NaN values)

Training Results

You can see in the Training Convergence plot that it took much more effort for the model to become confident in its understanding of the data than the previous example. However, it was able to make reasonable assumptions about states and emission data despite the more “noisy” data.

1_ = hmm.plot()

png

Predictions!

Let’s see how we did given the more much more interesting problem…

1y = hmm.predict(next_observations)
1_ = plot_states(y.predicted_state, next_observations[NAME], obs_mean=NEXT_EXAMPLE_MEAN, num_samples=NUM_SAMPLES)

png

Analysis

The model was better than a coin-flip. Given the data in this example, that is a pretty good outcome!

1score = accuracy_score(actual_states, y.predicted_state)
2print(f"Accuracy score: {score:.3%}")
Accuracy score: 63.000%
1report = classification_report(actual_states, y.predicted_state)
2print("Classification Report:\n", report)
Classification Report:
               precision    recall  f1-score   support

           0       0.69      0.73      0.71       124
           1       0.51      0.47      0.49        76

    accuracy                           0.63       200
   macro avg       0.60      0.60      0.60       200
weighted avg       0.62      0.63      0.63       200

Next Steps

So then, what can we do to improve our model in the face of difficult data?

StepNotesCaution
More statesAdding more states to our model can help.This comes at a price - model outputs are more difficult to interpret, model is more difficult to train.
More dataUsing more observations can help.This is not always an option; we can’t just get more data from a stock for example.

We’ll work through these in future posts.

Thanks for reading!