The Kalman Filter

Summary

The Kalman filter is the closed-form solution to the Bayesian filtering equations for a linear-Gaussian state-space model. Every distribution stays Gaussian, so the filter propagates only a mean $m_{k}$ and covariance $P_{k}$ through a two-step recursion: a prediction step that pushes the state forward through the dynamics, and an update step that corrects it with the new measurement via the Kalman gain $K_{k}$ and the innovation $v_{k}$ . It runs at constant cost per time step and is the forward pass feeding both The RTS Smoother and the marginal likelihood.

Overview

Bayesian filtering computes the filtering distribution $p (x_{k} ∣ y_{1 : k})$ — the posterior of the current state given all measurements up to now. The general recursion (Särkkä Thm. 4.1) alternates a Chapman–Kolmogorov prediction and a Bayes-rule update. For linear-Gaussian models these reduce to matrix algebra on $(m_{k}, P_{k})$ .

Main Content

Theorem: Bayesian filtering equations (Särkkä Thm. 4.1)

Starting from the prior $p (x_{0})$ , the predicted and filtering distributions obey, for $k = 1, 2, \dots$ :

Prediction step (Chapman–Kolmogorov equation): $p (x_{k} ∣ y_{1 : k - 1}) = \int p (x_{k} ∣ x_{k - 1}) p (x_{k - 1} ∣ y_{1 : k - 1}) d x_{k - 1} . (4.11)$

Update step (Bayes’ rule): $p (x_{k} ∣ y_{1 : k}) = \frac{1}{Z _{k}} p (y_{k} ∣ x_{k}) p (x_{k} ∣ y_{1 : k - 1}), (4.12)$ $Z_{k} = \int p (y_{k} ∣ x_{k}) p (x_{k} ∣ y_{1 : k - 1}) d x_{k} . (4.13)$

The normalizer $Z_{k} = p (y_{k} ∣ y_{1 : k - 1})$ is the one-step predictive density of the data — the building block of the marginal likelihood.

Theorem: Kalman filter (Särkkä Thm. 4.2)

For the linear-Gaussian model $x_{k} = A_{k - 1} x_{k - 1} + q_{k - 1}$ , $y_{k} = H_{k} x_{k} + r_{k}$ , the filtering distributions are Gaussian,
$p (x_{k} ∣ y_{1 : k - 1}) = N (x_{k} ∣ m_{k}^{-}, P_{k}^{-}), p (x_{k} ∣ y_{1 : k}) = N (x_{k} ∣ m_{k}, P_{k}), p (y_{k} ∣ y_{1 : k - 1}) = N (y_{k} ∣ H_{k} m_{k}^{-}, S_{k}), (4.19)$
computed by the following recursion, started from $m_{0}, P_{0}$ .

Prediction step:
$m_{k}^{-} = A_{k - 1} m_{k - 1}$ $P_{k}^{-} = A_{k - 1} P_{k - 1} A_{k - 1}^{T} + Q_{k - 1} (4.20)$
Update step:
$v_{k} = y_{k} - H_{k} m_{k}^{-} (innovation / measurement residual)$ $S_{k} = H_{k} P_{k}^{-} H_{k}^{T} + R_{k} (innovation covariance)$ $K_{k} = P_{k}^{-} H_{k}^{T} S_{k}^{- 1} (Kalman gain)$ $m_{k} = m_{k}^{-} + K_{k} v_{k}$ $P_{k} = P_{k}^{-} - K_{k} S_{k} K_{k}^{T} (4.21)$

Reading the equations

$m_{k}^{-}, P_{k}^{-}$ are the predicted (“prior”) mean and covariance before seeing $y_{k}$ ; the superscript $-$ marks “one step ahead, no current measurement.” $m_{k}, P_{k}$ are the updated (“posterior”) quantities after $y_{k}$ .

The innovation $v_{k} = y_{k} - H_{k} m_{k}^{-}$ is what the measurement tells you beyond the prediction; $S_{k}$ is its covariance.

The Kalman gain $K_{k}$ is the optimal weight on the innovation: large when the prediction is uncertain ( $P_{k}^{-}$ large) or the measurement is precise ( $R_{k}$ small), small otherwise.

The update always reduces covariance: $P_{k} = P_{k}^{-} - K_{k} S_{k} K_{k}^{T} ⪯ P_{k}^{-}$ . An equivalent “information” form is $P_{k} = ((P_{k}^{-})^{- 1} + H_{k}^{T} R_{k}^{- 1} H_{k})^{- 1}$ .

Derivation (Särkkä §4.3): apply the Gaussian joint/conditioning lemmas (Lemmas A.1–A.2) to $p (x_{k - 1}, x_{k} ∣ y_{1 : k - 1})$ for the prediction and to $p (x_{k}, y_{k} ∣ y_{1 : k - 1})$ for the update.

Algorithm (per time step)
given m_{k-1}, P_{k-1}:
  # predict
  m⁻ = A m_{k-1}
  P⁻ = A P_{k-1} Aᵀ + Q
  # update with y_k
  v  = y_k − H m⁻
  S  = H P⁻ Hᵀ + R
  K  = P⁻ Hᵀ S⁻¹
  m_k = m⁻ + K v
  P_k = P⁻ − K S Kᵀ
Cost is constant per step (a few $n \times n$ / $m \times m$ products and one $m \times m$ inverse) — the key practical advantage of the recursive form over batch Bayes.

Examples

Kalman filter for the Gaussian random walk (Särkkä Ex. 4.2)

For the local-level model ( $A = 1$ , $H = 1$ , scalar $Q, R$ ) the recursion collapses to scalars:
$m_{k}^{-} = m_{k - 1}, P_{k}^{-} = P_{k - 1} + Q,$ $m_{k} = m_{k}^{-} + \frac{P _{k}^{-}}{P _{k}^{-} + R} (y_{k} - m_{k}^{-}), P_{k} = P_{k}^{-} - \frac{( P _{k}^{-} ) ^{2}}{P _{k}^{-} + R} . (4.31)$
The scalar gain $K_{k} = P_{k}^{-} / (P_{k}^{-} + R)$ interpolates between trusting the prediction ( $R$ large $\Rightarrow K \to 0$ ) and trusting the measurement ( $R$ small $\Rightarrow K \to 1$ ). This same filter is smoothed in The RTS Smoother.

Kalman filter for car tracking (Särkkä Ex. 4.3)

Running the 4-D constant-velocity model (see Linear-Gaussian State-Space Models) on noisy position measurements recovers the full position+velocity trajectory. In Särkkä’s simulation the position RMSE drops from $0.77$ (raw measurements) to $0.43$ (filter estimate) — the filter exploits the dynamics to denoise. The RTS smoother lowers it further to $0.27$ .

Connections

Linear-Gaussian State-Space Models — the model being filtered; defines $A, H, Q, R$
The RTS Smoother — backward pass that reuses $m_{k}, P_{k}, m_{k + 1}^{-}, P_{k + 1}^{-}$ from this filter
Marginal Likelihood via the Kalman Filter — the innovations $v_{k}$ and covariances $S_{k}$ produced here factor the likelihood
State-Space Models and the Kalman Filter - Overview — pipeline context

Second Brain

Explorer

The Kalman Filter

The Kalman Filter

Overview

Main Content

Examples

Connections

See Also

Graph View

Table of Contents

Backlinks