4 The bootstrap filter
This chapter assumes fixed and known values of the model parameters. Chapter 5 considers the estimation of \(\theta\).
4.1 Background
The goal of the bootstrap filter is to use observed data \(y_{1:T}\) to learn about the hidden-states \(X_t\) given a pre-determined parameter vector \(\theta\).
The fundamental idea of the bootstrap filter (and particle filters in general) is to produce collections of samples from the target posterior distribution of a hidden-state model (Chapter 3) at each time-step \(t\). The result of this algorithm takes the form:
\[ \{x_t^{(i)}\}_{i=1}^N \sim P(X_t | y_{1:t}, \theta) \tag{4.1}\]
These samples are often called particles, hence the term particle filtering.
We can use these particles to find the posterior mean, for example:
\[ E[X_t|y_{1:t}, \theta] \approx \frac{1}{N} \sum_{i=1}^N x_t^{(i)} \]
or 95% credible intervals (by finding the 2.5th and 97.5th percentile values), or any other quantity we might care about. As these samples are Bayesian, they can be naturally used in downstream models.
Side-note: the particles in Equation 4.1 represent the filtering distribution (our knowledge about \(X_t\) using data availabile up-to time \(t\), see Section 3.1.1). As we will soon see, we can also use the bootstrap filter to find particles representing the smoothing distribution (our knowledge about \(X_t\) using all available data).
4.1.1 Overview
The bootstrap filter repeats three steps for each time step.
Assume we are at time step \(t\) and have a collection of samples from the filtering distribution at time \(t\!-\!1\), denoted \(\{x_{t-1}^{(i)}\}\), representing our guesses of the value of \(X_{t-1}\). We update these samples to represent our guesses of the value of \(X_t\) by:
- Projecting them forward according to the state-space transition distribution.
- Weighting these projected particles according to the observation distribution.
- Resampling these particles according to their weights.
This is visualised in Figure 4.1:

Step one creates a set of particles representing equally-likely one-step-ahead projections of \(X_t\) given data up to time \(t-1\):
\[ \{\tilde{x}_t^{(i)}\}_{i=1}^N \sim P(X_t | y_{1:t-1}, \theta) \]
Step two weights these projections according to how likely they are under the observation model:
\[ w_t^{(i)} = P(y_t | \tilde{x}_t^{(i)}, \theta) \]
And the final step resamples according to these weights to ensure the resulting \(\{x_t^{(i)}\}\) are equally-likely samples from \(P(X_t|y_{1:t}, \theta)\).
4.1.2 Simulation-based inference
In order to use SMC methods to fit a hidden-state model, we only need to:
- Sample from the state-space transition distribution
- Evaluate the observation distribution
Crucially, we do not need to evalute the state-space transition distribution, allowing for easy simulation-based inference.
Simulation-based inference methods rely on simulations rather than analytical solutions and are very popular in statistical epidemiology. In this case, sampling from the state-space transition distribtion is the same as running the “simulator” one-step forward into the future. This simulator implies a likelihood even though we never need to find its precise form.
4.2 The bootstrap filter
Like before, assume that we have a collection of samples from the filtering distribution at time \(t\!-\!1\), denoted \(\{x_{t-1}^{(i)}\} \sim P(X_{t-1}|y_{1:t-1}, \theta)\). For now, also assume that our state-space transitions are Markovian (i.e., \(X_t\) depends only on \(X_{t-1}\)).
Step 1: Projection
By coupling these particles with the state-space transition model (Equation 3.1), we can make one-step-ahead predictions. Mathematically we want to obtain the predictive distribution: \[ \underbrace{P(X_t | y_{1:t-1}, \theta)}_{\text{predictive dist.}} = \int \underbrace{P(X_t | X_{t-1}, \theta)}_{\text{state-space transition dist.}} \underbrace{P(X_{t-1}|y_{1:t-1}, \theta)}_{\text{prev. filtering dist.}} \ dX_{t-1} \]
Which is represented by complicated and potentially high-dimensional integral of two things we know. Fortunately, all we need to do is sample from the state-space transition model while conditioning on our particles:
\[ \tilde{x}_t^{(i)} \sim P(X_t | x_{t-1}^{(i)}, \theta) \]
Step 2: Weighting
If we treat this predictive distribution as the prior distribution for \(X_t|y_{1:t}\), we can apply Bayes’ formula:
\[ \underbrace{P(X_t | y_{1:t}, \theta)}_{\text{new filtering dist}} \propto \underbrace{P(y_t|X_t, y_{1:t-1}, \theta)}_{\text{observation dist.}} \underbrace{P(X_t | y_{1:t-1}, \theta)}_{\text{predictive dist.}} \]
Since we already have samples from the predictive distribution \(P(X_t | y_{1:t-1}, \theta)\), all we need to do is assign them weights \(w_t^{(i)}\) according to the observation distribution:
\[ w_t^{(i)} = P(y_t | \tilde{x}_t^{(i)}, y_{1:t-1}, \theta) \]
Step 3: Resampling
The final set of particles \(\{x_t^{(i)}\}\) is constructed by sampling (with replacement) from \(\{\tilde{x}_t^{(i)}\}\) according to the corresponding weights, resulting in:
\[ x_t^{(i)} \sim P(X_t | y_{1:t}, \theta) \]
Repeat
Starting with an initial set of particles \(\{x_0^{(i)}\} \sim P(X_0)\), all we need to do is repeat this procedue for each \(t = 1, 2, \ldots, T\), storing the particle values as we go. This concludes the bootstrap filter.
The bootstrap filter is a special case of a sequential importance sampling algorithm (Section 4.6).
4.3 Fixed-lag smoothing
There are two problems with the bootstrap filter above:
- It assumes the state-space transition model is Markovian (\(X_t\) depends only on \(X_{t-1}\))
- \(X_t\) is informed by past data \(y_{1:t}\) only (the filtering posterior), when we may want to leverage all available data (the smoothing posterior).
Both of these problems are solved using fixed-lag resampling.
4.4 Algorithm
We present the algorithm as a Julia code-snippet. For a pseudocode implementation, please see our preprint #TODO: add link.
4.5 Example
For demonstration, we consider a simple reproduction number estimator. First assume that \(\log R_t\) follows a Gaussian random walk:
[#TODO]
4.6 Additional resources
Bootstrap filters (and SMC methods more generally) have found use in many fields. Each field has their own motivation for and notation describing these methods. We provide an overview of other resources here.
Sequential Importance Sampling
Sequential imporance sampling is the name given to the general class of algorithms, of which the bootstrap filter is one. #TODO: finish writing this
Bayesian Filtering and Smoothing (Särkkä 2013)
Those with an engineering background may be familiar with “filtering and smoothing”, where the state of a time-varying system is tracked through the observation of noisy measurements. Classical examples include GPS position tracking or audio signal processing.
The Kalman filter, which provides an analytical solution when the state-space transition and observation models are linear Gaussian and Markovian, is perhaps the best-known example of a filtering method from engineering.
Chapters 7 and 11 of Särkkä (2013) introduce SMC methods under the headings “particle filtering” and “particle smoothing”. We also recommend chapters 1 (What are Bayesian filtering and smoothing?), 4 (Bayesian filtering equations and exact solutions), and 8 (Bayesian smoothing equations and exact solutions).
A survey of Sequential Monte Carlo Methods for Economics and Finance (Creal 2012)
Those with an econometrics background may find this extensive review helpful, although the author focusses on Markovian models. Examples employed in this review include a stochastic volatility model and a nonlinear dynamic stochastic general equilibrium model.
This list is incomplete. If you know of any additional resources that may be helpful, please get in touch!
Data Assimilation Fundamentals: … (Evensen, Vossepoel, and Van Leeuwen 2022)
Those with a background in atmospheric science, oceanography, metereology, or other environmental sciences may be familiar with “data assimilation”, where focus is placed on combining model predictions with observational data. Chapter 9 of this book introduces particle filters as a method for solving the “fully nonlinear data assimilation” problem.