Sampling and inference in Hidden Markov models

A finite state hidden Markov model is specified by an initial distribution, a sequence of transition probabilities, and a sequence of observation probabilities:

\[\begin{aligned} &P(x_0 = i) \\ &P(x_t = i \mid x_{t-1} = j), \quad t = 1, \ldots, T \\ &P(y_t = i \mid x_t = j), \quad t = 0, 1, \ldots, T \end{aligned}\]

This tutorial describes how to use MarkovKernels.jl to:

Sample from a (finite state) hidden Markov model
Compute the a posteriori distribution of the hidden sequence using the backward recursion

using MarkovKernels
using Random, LinearAlgebra
import Plots
rng = Random.Xoshiro(19910215)

Implementing samplers for finite state (hidden) Markov models

function sample(rng::AbstractRNG, init, fw_kernels)
    x = rand(rng, init)
    n = length(fw_kernels) + 1
    xs = Vector{typeof(x)}(undef, n)
    xs[begin] = x

    for (m, fw_kernel) in pairs(fw_kernels)
        x = rand(rng, condition(fw_kernel, x))
        xs[begin+m] = x
    end
    return xs
end

function sample(rng::AbstractRNG, init, fw_kernels, obs_kernels)
    # sample initial values
    x = rand(rng, init)
    y = rand(rng, first(obs_kernels), x)

    # allocate output
    n = length(obs_kernels)
    xs = Vector{typeof(x)}(undef, n)
    ys = Vector{typeof(y)}(undef, n)

    xs[begin] = x
    ys[begin] = y

    for (m, fw_kernel) in pairs(fw_kernels)
        obs_kernel = obs_kernels[begin+m]
        x = rand(rng, condition(fw_kernel, x))
        y = rand(rng, condition(obs_kernel, x))
        xs[begin+m] = x
        ys[begin+m] = y
    end
    return xs, ys
end

Defining a finite state hidden Markov model and sampling it

# number of possible hidden states and number of possible observation states
m, n = 10, 10

# probability vector of initial distribution
init = Categorical(ones(m))

# transition probabilities
Pxx = Matrix(Tridiagonal(ones(m - 1), 5 * ones(m), ones(m - 1)))
Kxx = StochasticMatrix(Pxx)

# observation probabilites
Pyx = (ones(m, m) - I)
Kyx = StochasticMatrix(Pyx)

T = 2^8 + 1
fw_kernels = fill(Kxx, T - 1)
obs_kernels = fill(Kyx, T)

# sample hidden and observed states
xs, ys = sample(rng, init, fw_kernels, obs_kernels)

Plotting the realization

hmm_plt = Plots.scatter(
    layout = (1, 2)
)
Plots.scatter!(
    hmm_plt,
    eachindex(xs),
    xs,
    color = "black",
    subplot = 1,
    title = "hidden states",
    label = "",
)
Plots.scatter!(
    hmm_plt,
    eachindex(ys),
    ys,
    color = "red",
    subplot = 2,
    title = "observation states",
    label = "",
)

Implementing the backward algorithms

The backward algorith operates on the sequence of likelihoods of future observations:

\[h_{t:T \mid s}(x) = P(y_{t:T} \mid x_s = x).\]

It computes an a posteriori initial distribution, a sequence of a posteriori transition probabilities, and a log-likelihood of the observations, via a backward recursion. The recursion is given by:

\[\begin{aligned} h_{t:T\mid t-1}(z) &= \sum_x h_{t:T \mid t}(x) P(x_t = x \mid x_{t-1} = z) \\ P(x_t = x \mid x_{t-1} = z, y_{0:T}) &= h_{t:T \mid t}(x) P(x_t = x \mid x_{t-1} = z) / h_{t:T\mid t-1}(z) \\ h_{t-1:T \mid t-1}(x) &= h_{t:T\mid t-1}(x) h_{t-1 \mid t-1}(x) \end{aligned}\]

The first two equations are implemented by htransform and the last equation is implemented by compose. The algorithm terminates by computing the a posteriori initial distribution and the log-likelihood of the observations:

\[\begin{aligned} \log P(y_{0:T}) &= \log \Big(\sum_x h_{0:T \mid 0}(x) P(x_0 = x) \Big) \\ P(x_0 = x \mid y_{0:T}) &= h_{0:T \mid 0}(x) P(x_0 = x) / P(y_{0:T}) \end{aligned}\]

These equations are implemented by posterior_and_loglike. Using MarkovKernels.jl, the code might look something like the following:

function backward_recursion(init, forward_kernels, likelihoods)
    h = last(likelihoods)
    KT = Base.promote_op(first ∘ htransform, eltype(forward_kernels), typeof(h))
    post_forward_kernels = Vector{KT}(undef, length(forward_kernels))

    for m in eachindex(forward_kernels)
        fw_kernel = forward_kernels[end-m+1]
        post_fw_kernel, h = htransform(fw_kernel, h)
        post_forward_kernels[end-m+1] = post_fw_kernel

        like = likelihoods[end-m]
        h = compose(h, like)
    end
    post_init, loglike = posterior_and_loglike(init, h)
    return post_init, post_forward_kernels, loglike
end

Computing the a posteriori distribution of the hidden sequence

likes = [Likelihood(Kobs, y) for (Kobs, y) in zip(obs_kernels, ys)] # compute the likelihoods associated with the observations
post_init, post_fw_kernels, loglike = backward_recursion(init, fw_kernels, likes)

Sampling a posteriori hidden sequences and plotting them

nsample = 10
for _ in 1:nsample
    xs_post = sample(rng, post_init, post_fw_kernels)
    Plots.scatter!(
        hmm_plt,
        eachindex(xs_post),
        xs_post,
        label = "",
        color = "blue",
        alpha = 0.1,
        )
end
hmm_plt