HMM EXAMPLES with Continuous Densities¶
#!pip install git+https://github.com/compi1234/pyspch.git
try:
import pyspch
except ModuleNotFoundError:
try:
print(
"""
To enable this notebook on platforms as Google Colab,
install the pyspch package and dependencies by running following code:
!pip install git+https://github.com/compi1234/pyspch.git
"""
)
except ModuleNotFoundError:
raise
%matplotlib inline
import sys, os,time
import numpy as np
import pandas as pd
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import copy
import pyspch.stats.probdist as Densities
from pyspch.stats import libhmm
from pyspch.core import make_seq1
from pyspch import display as Spd
pd.set_option('display.float_format','{:,.3f}'.format)
pd.set_option('display.precision',3)
# print all variable statements
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# graphical and print preferences
cmap_car = sns.light_palette("caramel",50,input="xkcd")[0:25]
cmap="OrRd_r"
pd.options.display.float_format = '{:,.3f}'.format
mpl.rcParams['figure.figsize'] = [12.0, 6.0]
mpl.rcParams['xtick.labelsize'] = 14
mpl.rcParams['ytick.labelsize'] = 14
mpl.rcParams['axes.titlepad'] = 15
mpl.rcParams['axes.titlesize'] = 'large'
mpl.rcParams['axes.linewidth'] = 2
mpl.rc('lines', linewidth=3, color='k')
# catch future warnings -- occurring in matplotlib
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
%%HTML
<style type="text/css">
table.dataframe td, table.dataframe th {
border: 2px black solid !important;
column-width: 60px;
color: black !important;
}
| prior | mean | var | |
|---|---|---|---|
| 0 | 0.333 | [5.5, 3.5, 4.0, 1.4] | [0.1, 0.1, 0.1, 0.1] |
| 1 | 0.333 | [6.0, 2.5, 4.4, 1.5] | [0.1, 0.1, 0.1, 0.1] |
| 2 | 0.333 | [6.5, 3.0, 4.5, 2.0] | [0.1, 0.1, 0.1, 0.1] |
Likelihoods and Posteriors¶
The Gaussian Densities compute the likelihood that a feature vector is drawn from a given distribution. After application of Bayes formula these likelihoods may be converted to posteriors, i.e. the probability that the feature vector was drawn from a particular class.
If priors are uniform amongst all classes, then the posterior computation is nothing more than a normalization such that the sum of all values are 1.0. It emphasizes that the absolute values of the likelihoods are often meaningless, but it is there relative values that are decisive. In the plot below you may see that the likelihoods for class '0' are very low for in class data (the model is not so good) , but nevertheless thy are still very distinctive from the other classes and recognition of class '0' data is probably perfect.
In the plots below, you observe frame likelihoods 2 times (line plot and heatmap plot) and in the bottom axis the posteriors.
# the predict_prob() method computes the joint likelihood of all features
# the predict_proba() method computes the posteriors
probs=Gauss1.predict_prob(X2s)
posteriors=Gauss1.predict_proba(X2s)
fig = Spd.SpchFig(row_heights=[1.,1.,1.])
fig.add_line_plot(probs.T,iax=0,ylabel='probs')
fig.add_img_plot(probs.T,iax=1,cmap='GnBu',ylabel='probs')
fig.add_line_plot(posteriors.T,iax=2,ylabel='posteriors')
display(fig)
print("Accuracy on X1: %.1f%% " % (100.0*Gauss1.score(X1,y1)) )
print("Accuracy on X2: %.1f%% " % (100.0*Gauss1.score(X2,y2)) )
print("Accuracy on X3: %.1f%% " % (100.0*Gauss1.score(X3,y2)) )
print("Accuracy on X4: %.1f%% " % (100.0*Gauss1.score(X4,y2)) )
Accuracy on X1: 80.0% Accuracy on X2: 82.7% Accuracy on X3: 64.0% Accuracy on X4: 60.0%
Training the Gaussian Models¶
As we are given alignments, we may choose just to train the observation model, i.e. the Gaussian densities in our case. We apply a large variance smoothing for training from such small data set to avoid overtraining as otherwise variance could be unrealistically low.
Comparing with our intuitive model 'Gauss1' we see moderate shifts of means and changes to the variances
pd.set_option('display.precision',1)
Gauss2 = Densities.Gaussian(var_smoothing=.1)
Gauss2.fit(X1, y1)
Gauss2.print_model()
Gauss2.plot_model()
Gaussian(var_smoothing=0.1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Gaussian(var_smoothing=0.1)
| prior | mean | var | |
|---|---|---|---|
| 0 | 0.333 | [5.055193820019556, 3.477077745891922, 1.43798... | [0.45848341029954665, 0.40628972920029605, 0.3... |
| 1 | 0.333 | [5.983226609712076, 2.757339815169499, 4.29350... | [0.6510144929195472, 0.4357022821781659, 0.516... |
| 2 | 0.333 | [6.498789066161893, 2.9113455734820333, 5.5267... | [0.7087545415501917, 0.4022732835847094, 0.603... |
probs=Gauss2.predict_prob(X2s)
posteriors=Gauss2.predict_proba(X2s)
fig = Spd.SpchFig(row_heights=[1.,1.,1.])
fig.add_line_plot(probs.T,iax=0,ylabel='probs')
fig.add_img_plot(probs.T,iax=1,cmap='GnBu',ylabel='probs')
fig.add_line_plot(posteriors.T,iax=2,ylabel='posteriors')
display(fig)
Frame classification with the trained models¶
We observe that the 'clean' test data scores almost as good as the training data
indicating that we seem to have a properly trained model.
However for the noise test data the frame error rate is still quite high. For the 'very noisy' data we are even stuck at an error rate around 50%.
print("Accuracy on X1: %.1f%% " % (100.0*Gauss2.score(X1,y1)) )
print("Accuracy on X2: %.1f%% " % (100.0*Gauss2.score(X2,y2)) )
print("Accuracy on X3: %.1f%% " % (100.0*Gauss2.score(X3,y2)) )
print("Accuracy on X4: %.1f%% " % (100.0*Gauss1.score(X4,y2)) )
Accuracy on X1: 93.3% Accuracy on X2: 94.7% Accuracy on X3: 81.3% Accuracy on X4: 60.0%
HMM with a Gaussian Observation Model¶
The frame classifiers above did not make use of the inherent sequence property of our data. We will now exploit the fact that we expect the data to be drawn in sequence from states 0-1-2. Though we will not enforce this 100% as occasionally there are random switches between states.
We make two hmm-models with the same init and transition probabilities.
The model has a tendency left-to-right, though a very small probability allows for breaking this topology
- hmm1 : uses Gauss1 (intuitive) as observation model
- hmm2 : uses Gauss2 (trained) as observation model
We use log-prob's throughout to accomodate for small probs and long sequences
eps = 1.e-5
imat = np.array([1.0, eps,eps])
tmat = np.array([[.8,.2,eps],[eps,.8,.2],[eps,eps,1.]])
#hmm1 = libhmm.HMM(prob_style="lin",obs_model=Gauss1,transmat=tmat,initmat=imat)
hmm1 = libhmm.HMM(prob_style="log",obs_model=Gauss1,n_states=3,
transmat=np.log(tmat),initmat=np.log(imat))
hmm1.print_model()
# clone hmm1 and modify the observation model
hmm2 = copy.deepcopy(hmm1)
hmm2.obs_model = Gauss2
HMM STATE MODEL =================
| 0 | 1 | 2 | |
|---|---|---|---|
| Pinit(.) | 0.000 | -11.513 | -11.513 |
| 0 | 1 | 2 | |
|---|---|---|---|
| P(0|.) | -0.223 | -11.513 | -11.513 |
| P(1|.) | -1.609 | -0.223 | -11.513 |
| P(2|.) | -11.513 | -1.609 | 0.000 |
OBSERVATION MODEL =================
| prior | mean | var | |
|---|---|---|---|
| 0 | 0.333 | [5.5, 3.5, 4.0, 1.4] | [0.1, 0.1, 0.1, 0.1] |
| 1 | 0.333 | [6.0, 2.5, 4.4, 1.5] | [0.1, 0.1, 0.1, 0.1] |
| 2 | 0.333 | [6.5, 3.0, 4.5, 2.0] | [0.1, 0.1, 0.1, 0.1] |
Trellis Computatitons¶
In the next cell
- Select model hmm1 or hmm2
- Select Datastream and reference alignment (X1,X2,X3,X4 or X2s,X3s,X4s)
- Compute the Viterbi Trellis (Normalize=True: max value =1.0 for prob or 0.0 for logprob)
X = X4s
y = y2s
hmm = hmm2
tr = libhmm.Trellis(hmm,Normalize=True)
tr.viterbi_pass(X)
tr.print_trellis(what=['obs_probs','probs'])
Observation Probabilities
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -38.614 | -31.242 | -36.314 | -40.900 | -18.758 | -11.300 | -36.129 | -6.599 | -14.656 | -10.157 | -64.763 | -79.271 | -62.633 | -44.675 | -49.488 |
| 1 | -58.490 | -16.348 | -55.005 | -45.995 | -37.916 | -8.774 | -38.935 | -22.771 | -11.482 | -6.018 | -18.770 | -25.395 | -15.364 | -11.830 | -18.204 |
| 2 | -70.704 | -19.175 | -65.232 | -47.348 | -46.490 | -15.462 | -51.086 | -32.985 | -17.343 | -10.480 | -10.272 | -19.639 | -7.423 | -8.613 | -12.298 |
Trellis Probabilities (Viterbi)
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.000 | -13.507 | 0.000 | 0.000 | 0.000 | -1.140 | 0.000 | 0.000 | -1.788 | -5.926 | -59.031 | -71.144 | -66.723 | -47.575 | -48.703 |
| 1 | -31.388 | 0.000 | -7.401 | -6.481 | -20.544 | 0.000 | -1.666 | -17.559 | 0.000 | 0.000 | -7.112 | -13.090 | -19.454 | -14.730 | -17.419 |
| 2 | -43.603 | -12.730 | -19.014 | -15.235 | -35.600 | -16.592 | -15.204 | -29.439 | -15.765 | -5.848 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Then,
- evaluate the strength of the acoustic model by itself by performing frame recognition
- evaluate for a given setup the resulting alignment and number of errors vs. frame classificaiton
- observe dependency on
- quality of the model (moderate quality given model vs. properly training model)
- performance difference train/test
- performance difference clean/noisy data
## evaluate on frame based recognition
print('\nFrame Recognition vs. Viterbi Alignment vs. Reference\n')
y_frame = hmm.obs_model.predict(X)
y_viterbi = tr.backtrace()
pd.DataFrame(np.array([y_frame,y_viterbi,y]),index=['FRAME','VITERBI','REF'])
print("\nFrame Recognition Accuracy: %.1f%%" % (100.0*hmm.obs_model.score(X,y)) )
print("Viterbi Alignment Accuracy: %.1f%% "% (100.0*np.sum(y_viterbi == y)/len(y)) )
#print("\nSequence Probability: %.2f\n" % tr.seq_prob)
Frame Recognition vs. Viterbi Alignment vs. Reference
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| FRAME | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |
| VITERBI | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |
| REF | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |
Frame Recognition Accuracy: 80.0% Viterbi Alignment Accuracy: 73.3%
y_frame, y_viterbi
(array([0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 2, 2, 2, 2, 2]), array([0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2]))
# column by column normalization is applied in the trellis, so the max-value shown in always 0.0 with logprobs
# hence, the sequence probability is the product of the column-scaling (or sum of log-scalings)
cmap="OrRd_r"
tr.plot_trellis(cmap=cmap,plot_norm=True,plot_obs_probs=True,plot_values=True,fmt=".2f",vmin=-100,vmax=0,
plot_backptrs=True,plot_alignment=True,figsize=(15,5))