HMM EXAMPLE with Continuous Densities¶
This example is very similar to the example in the course notes. However there is no numerical exact match
In [1]:
#!pip install git+https://github.com/compi1234/pyspch.git
try:
import pyspch
except ModuleNotFoundError:
try:
print(
"""
To enable this notebook on platforms as Google Colab,
install the pyspch package and dependencies by running following code:
!pip install git+https://github.com/compi1234/pyspch.git
"""
)
except ModuleNotFoundError:
raise
In [2]:
%matplotlib inline
import sys, os,time
import numpy as np
import pandas as pd
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import copy
import pyspch.stats.probdist as Densities
from pyspch.stats import libhmm
import pyspch.display as Spd
import pyspch.core as Spch
# print all variable statements
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# graphical and print preferences
cmap_1 = sns.light_palette("caramel",50,input="xkcd")
cmap_2 = sns.light_palette("caramel",50,input="xkcd")[0:25]
cmap_3 = sns.light_palette("caramel",50,input="xkcd")[25:50]
cmap = cmap_2
pd.options.display.float_format = '{:,.3f}'.format
# catch future warnings -- occurring in matplotlib
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
In [3]:
mpl.rcParams['figure.figsize'] = [12.0, 6.0]
#mpl.rcParams['ps.papersize'] = 'A4'
mpl.rcParams['xtick.labelsize'] = 14
mpl.rcParams['ytick.labelsize'] = 14
mpl.rcParams['axes.titlepad'] = 15
mpl.rcParams['axes.titlesize'] = 'large'
mpl.rcParams['axes.linewidth'] = 2
mpl.rc('lines', linewidth=3, color='k')
In [4]:
%%HTML
<style type="text/css">
table.dataframe td, table.dataframe th {
border: 2px black solid !important;
column-width: 60px;
color: black !important;
}
| prior | mean | var | |
|---|---|---|---|
| 0 | 0.200 | [1.0] | [1.0] |
| 1 | 0.200 | [10.0] | [1.0] |
| 2 | 0.200 | [15.0] | [1.0] |
| 3 | 0.200 | [6.0] | [1.0] |
| 4 | 0.200 | [2.0] | [1.0] |
In [6]:
Gauss1.__dict__
Out[6]:
{'priors': None,
'var_smoothing': 1e-09,
'n_classes': 5,
'n_features_in_': 1,
'theta_': array([[ 1.],
[10.],
[15.],
[ 6.],
[ 2.]]),
'var_': array([[1.],
[1.],
[1.],
[1.],
[1.]]),
'classes_': array([1, 2, 3, 4, 5]),
'class_prior_': array([0.2, 0.2, 0.2, 0.2, 0.2])}
HMM with a Gaussian Observation Model¶
The frame classifiers above did not make use of the inherent sequence property of our data. We will now exploit the fact that we expect the data to be drawn in sequence from states 0-1-2. Though we will not enforce this 100% as occasionally there are random switches between states.
We make two hmm-models with the same init and transition probabilities.
The model has a tendency left-to-right, though a very small probability allows for breaking this topology
- hmm1 : uses Gauss1 (intuitive) as observation model
- hmm2 : uses Gauss2 (trained) as observation model
We use log-prob's throughout to accomodate for small probs and long sequences
In [7]:
hmm1 = libhmm.HMM(prob_style="log",obs_model=Gauss1,n_states=5,prob_floor=1.e-20)
hmm1.init_topology(type="lr",selfprob=.8)
hmm1.print_model()
HMM STATE MODEL =================
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| Pinit(.) | 0.000 | -46.052 | -46.052 | -46.052 | -46.052 |
| 0 | 1 | 2 | 3 | 4 | |
|---|---|---|---|---|---|
| P(0|.) | -0.223 | -46.052 | -46.052 | -46.052 | -46.052 |
| P(1|.) | -1.609 | -0.223 | -46.052 | -46.052 | -46.052 |
| P(2|.) | -46.052 | -1.609 | -0.223 | -46.052 | -46.052 |
| P(3|.) | -46.052 | -46.052 | -1.609 | -0.223 | -46.052 |
| P(4|.) | -46.052 | -46.052 | -46.052 | -1.609 | 0.000 |
OBSERVATION MODEL =================
| prior | mean | var | |
|---|---|---|---|
| 0 | 0.200 | [1.0] | [1.0] |
| 1 | 0.200 | [10.0] | [1.0] |
| 2 | 0.200 | [15.0] | [1.0] |
| 3 | 0.200 | [6.0] | [1.0] |
| 4 | 0.200 | [2.0] | [1.0] |
Trellis Computatitons¶
In the next cell
- Select model hmm1 or hmm2
- Select Datastream and reference alignment (X1,X2,X3,X4 or X2s,X3s,X4s)
- Compute the Viterbi Trellis (Normalize=True: max value =1.0 for prob or 0.0 for logprob)
In [ ]:
In [8]:
X = X2
hmm = hmm1
tr = libhmm.Trellis(hmm)
tr.viterbi_pass(X)
tr.print_trellis(what=['obs_probs','probs'])
Observation Probabilities
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.995 | -0.919 | -37.933 | -32.571 | -63.745 | -6.134 | -5.580 | -2.755 | -5.668 | -7.350 | -3.553 | -0.969 |
| 1 | -45.010 | -41.405 | -0.997 | -1.464 | -3.360 | -17.568 | -18.602 | -26.007 | -18.431 | -15.572 | -23.396 | -44.317 |
| 2 | -104.462 | -98.897 | -15.477 | -19.182 | -4.813 | -58.921 | -60.837 | -73.925 | -60.521 | -55.140 | -69.420 | -103.400 |
| 3 | -15.448 | -13.411 | -7.413 | -5.289 | -20.198 | -2.486 | -2.814 | -5.673 | -2.758 | -1.918 | -4.577 | -15.051 |
| 4 | -1.886 | -1.417 | -29.829 | -25.114 | -53.035 | -3.404 | -3.026 | -1.339 | -3.086 | -4.264 | -1.758 | -1.785 |
Trellis Probabilities (Viterbi)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.995 | -2.137 | -40.294 | -73.088 | -116.227 | -62.199 | -68.002 | -68.793 | -74.643 | -79.411 | -79.878 | -79.052 |
| 1 | -91.061 | -44.010 | -4.744 | -6.431 | -10.014 | -27.805 | -46.630 | -72.861 | -87.406 | -87.633 | -99.721 | -122.400 |
| 2 | -150.514 | -145.944 | -61.096 | -25.535 | -12.853 | -70.544 | -90.251 | -122.165 | -129.496 | -127.201 | -145.745 | -181.482 |
| 3 | -61.499 | -60.458 | -55.602 | -56.085 | -47.342 | -16.948 | -19.986 | -25.882 | -28.864 | -31.005 | -35.805 | -51.079 |
| 4 | -47.937 | -48.464 | -78.018 | -75.910 | -105.518 | -52.356 | -21.584 | -22.923 | -26.009 | -30.273 | -32.031 | -33.816 |
In [9]:
cmap="OrRd_r"
# column by column normalization is applied for the heatmap plotting of the trellis,
tr.plot_trellis(xticks=X1,plot_norm=True,plot_obs_probs=True,plot_values=True,fmt=".1f",vmin=-220,vmax=0,
cmap=cmap,plot_backptrs=True,plot_alignment=True,figsize=(16,8))
Out[9]: