Sampling and inference in Gauss-Markov models
Gauss-Markov realizable signal, $s_t$, is a signal that can be constructed as transform of a Gauss-Markov process:
\[\begin{aligned} x_0 &\sim \mathcal{N}(\mu_0, \Sigma_0) \\ x_t \mid x_u &\sim \mathcal{N}( \Phi_{t, u} x_u, Q_{t, u}), \quad u < t \\ s_t \mid x_t &\sim \delta(\cdotp - C x_t) \end{aligned}\]
If $x_t$ is equal in distribution, to the solution of a linear time-invariant stochastic differential equation, then there matrices $A$ and $B$ such that the transition matrix, $\Phi$, and transition covariance $Q$ are given by
\[\begin{aligned} \Phi_{t, u} &= e^{A (t - u)}\\ Q_{t, u} &= \sqrt{t-u} \int_0^1 e^{A(t-u)z} B B^* e^{A^*(t-u)z} \mathrm{d} z \end{aligned}\]
The signal is frequently only available through noisy observations:
\[y_{t_k} \mid s_{t_k} \sim \mathcal{N}(s_{t_k}, R), \quad k = 1, \ldots, K\]
This tutorial describes how to use MarkovKernels.jl
to:
- Sample Gauss-Markov realizable signals
- Sample observations of Gauss-Markov realizable signals
- Compute the path-posterior of the latent state $x$ on a grid that includes all the observations
- Compute time-marginals of the latent state $x$ and the signal $s$
using MarkovKernels
using FiniteHorizonGramians
using Random, LinearAlgebra
import Plots
rng = Random.Xoshiro(19910215)
Implementing samplers for Gauss-Markov realizable signals
The code for sampling from a Gauss-Markov realizable signal is exactly the same as in the hidden Markov model tutorial. For completeness, the code is given below.
function sample(rng::AbstractRNG, init, fw_kernels)
x = rand(rng, init)
n = length(fw_kernels) + 1
xs = Vector{typeof(x)}(undef, n)
xs[begin] = x
for (m, fw_kernel) in pairs(fw_kernels)
x = rand(rng, condition(fw_kernel, x))
xs[begin+m] = x
end
return xs
end
function sample(rng::AbstractRNG, init, fw_kernels, obs_kernels)
# sample initial values
x = rand(rng, init)
y = rand(rng, first(obs_kernels), x)
# allocate output
n = length(obs_kernels)
xs = Vector{typeof(x)}(undef, n)
ys = Vector{typeof(y)}(undef, n)
xs[begin] = x
ys[begin] = y
for (m, fw_kernel) in pairs(fw_kernels)
obs_kernel = obs_kernels[begin+m]
x = rand(rng, condition(fw_kernel, x))
y = rand(rng, condition(obs_kernel, x))
xs[begin+m] = x
ys[begin+m] = y
end
return xs, ys
end
Defining a Gauss-Markov realizable signal and computing transition kernels
Implementation of transition kernel computation in Cholesky parametrization for continuous-time Gauss-Markov processes may be done by using FiniteHorizonGramians.jl
like so:
function transition_kernel(A, B, dt)
alg = FiniteHorizonGramians.ExpAndGram{Float64,13}()
Φ, U = FiniteHorizonGramians.exp_and_gram_chol(A, B, dt, alg)
Φ = LinearMap(Φ)
Q = Cholesky(UpperTriangular(U))
return NormalKernel(Φ, Q)
end
Definition of some parametrized continuous-time Gauss-Markov realizable process:
function gauss_markov_realizable_model(λ, σ, p)
A = λ * (I - 2 * tril(ones(p, p)))
B = sqrt(2λ) * ones(p, 1)
Cadj =
σ * factorial(p - 1) / sqrt(factorial(2(p - 1))) .* binomial.(p - 1, 0:p-1) .*
(-1) .^ (0:p-1)
C = adjoint(Cadj)
return A, B, C
end
Computing initial density, transition kernels, output kernel, and observation kernels:
# time grid
n = 2^9 + 1
T = 20.0
ts = LinRange(zero(T), T, n)
# model parameters
λ = 2.0
σ = 1.0
p = 4
# model matrices
A, B, C = gauss_markov_realizable_model(λ, σ, p)
init = Normal(zeros(p), cholesky(diagm(0 => ones(p))))
forward_kernels = [transition_kernel(A, B, dt) for dt in diff(ts)]
output_kernel = DiracKernel(LinearMap(C))
output_kernels = fill(output_kernel, n)
Sampling the latent state and the signal:
xs, ss = sample(rng, init, forward_kernels, output_kernels)
Defining observation kernel and sampling observations:
R = 0.1
observation_kernel = NormalKernel(LinearMap(one(R)), R)
ys = [rand(rng, condition(observation_kernel, s)) for s in ss]
obs_idx = [mod(n, 2^3) == 1 for n in eachindex(ts)]
obs_ts = ts[obs_idx]
ys = Union{eltype(ys), Missing}[obs_idx[n] ? ys[n] : missing for n in eachindex(obs_idx)]
Plotting the realization
gmr_plt = Plots.plot(
ts,
ss,
color = "black",
label = "",
xlabel = "t"
)
Plots.scatter!(
gmr_plt,
obs_ts,
filter(!ismissing, ys),
color = "red",
label = ""
)
gmr_plt
Implementing the forward algorithm
The forward algorithm operates on a sequence of a posteriori terminal distributions (so-called filtering distributions):
\[p(x_{t_m} \mid y_{t_0:t_m})\]
The algorithm first initializes by:
\[\begin{aligned} p(y_{t_0}) &= \int p(y_{t_0} \mid x_{t_0}) p(x_{t_0}) \mathrm{d} x_{t_0} \\ p(x_0 \mid y_0) &= p(y_{t_0} \mid x_{t_0}) p(x_{t_0}) / p(y_{t_0}) \end{aligned}\]
These two equations are implemented by posterior_and_loglike
. The algorithm then computes a sequence of filtering distributions, reverse-time transition kernels, and accumulates the log-likelihood of the observations according to the following forward recursion:
\[\begin{aligned} p(x_{t_m} \mid y_{0:t_{m-1}}) &= \int p(x_{t_m} \mid x_{t_{m-1}}) p(x_{t_{m-1}} \mid y_{t_0:t_{m-1}}) \mathrm{d} x_{t_{m-1}} \\ p(x_{t_{m-1}} \mid x_{t_{m-1}}, y_{t_0:t_n}) &= p(x_{t_m} \mid x_{t_{m-1}}) p(x_{t_{m-1}} \mid y_{t_0:t_{m-1}}) / p(x_{t_m} \mid y_{t_0:t_{m-1}}) \\ p(y_{t_m} \mid y_{t_0:t_{m-1}}) &= \int p(y_{t_m} \mid x_{t_m}) p(x_{t_m} \mid y_{t_0:t_{m-1}}) \mathrm{d} x_{t_m} \\ p(x_{t_m} \mid y_{t_0:t_m}) &= p(y_{t_m} \mid x_{t_m}) p(x_{t_m} \mid y_{t_0:t_{m-1}}) / p(y_{t_m} \mid y_{t_0:t_{m-1}}) \\ \log p(y_{t_0:t_m}) &= \log p(y_{t_0:t_{m-1}}) + \log p(y_{t_m} \mid y_{t_0:t_{m-1}}) \end{aligned}\]
The first two equations are implemented by invert
and the next two equations are, again, implemented by posterior_and_loglike
. The last equation is simply a cumulative sum of quantities already computed. Using MarkovKernels.jl
, the code might look something like the following:
function forward_recursion(init, forward_kernels, likelihoods)
like = first(likelihoods)
filt, loglike = posterior_and_loglike(init, like)
KT = Base.promote_op(last ∘ invert, typeof(filt), eltype(forward_kernels))
backward_kernels = Vector{KT}(undef, length(forward_kernels))
for m in eachindex(forward_kernels)
fw_kernel = forward_kernels[m]
pred, bw_kernel = invert(filt, fw_kernel)
like = likelihoods[m+1]
filt, ll_incr = posterior_and_loglike(pred, like)
loglike = loglike + ll_incr
backward_kernels[m] = bw_kernel
end
return backward_kernels, filt, loglike
end
Computing a reverse-time Markov a posteriori path-distribution
In order to compute the posterior, likelihood functions for the latent state based on the observations. Some observations are of type Missing
but Likelihood{<:AbstractMarkovKernel,<:Missing}
is interpreted as a likelihood of type FlatLikelihood
, which turns posterior_and_loglike
into a kind of identity mapping.
likelihoods = [Likelihood(compose(observation_kernel, output_kernel), y) for y in ys]
The reverse-time posterior is now computed by:
backward_kernels, term, loglike = forward_recursion(init, forward_kernels, likelihoods)
Computing a posteriori time-marginals
The a posteriori time marginals may be computed according to the following backward recursion:
\[p(x_{t_{m-1}} \mid y_{t_0:t_n}) = \int p(x_{t_{m-1}} \mid x_{t_m}, y_{t_0:t_n}) p(x_{t_m} \mid y_{t_0:t_n}) \mathrm{d} x_{t_m}\]
This equation is implemented by marginalize
. Using MarkovKernels.jl
, the code might look something like the following:
function reverse_time_marginals(term, bw_kernels)
dist = term
dists = Vector{typeof(init)}(undef, length(bw_kernels)+1)
dists[end] = dist
for (m, kernel) in pairs(reverse(bw_kernels))
dist = marginalize(dist, kernel)
dists[end-m] = dist
end
return dists
end
The time-marginals are then computed like so:
post_state_dists = reverse_time_marginals(term, backward_kernels)
post_output_dists = [marginalize(dist, output_kernel) for dist in post_state_dists]
Plotting the a posteriori time marginals and samples
Plots.plot!(
gmr_plt,
ts,
post_output_dists,
color = "blue",
label = "",
)
nsample = 5
for _ in 1:nsample
# sample assumes an initial distribution and a series of forward kenrels
# so the backward_kernels need to be wrapped in reverse
# giving the signal in reverse order
_, ss_post = sample(rng, term, reverse(backward_kernels), output_kernels)
# reverse the signal path so that it is in the correct order
ss_post = reverse(ss_post)
Plots.plot!(
gmr_plt,
ts,
ss_post,
color = "blue",
label = "",
)
end
gmr_plt