2 Background of \(R_t\) estimation

There exists a wide range of approaches to reproduction number estimation spanning different statistical philosophies, model structures, estimation methods, data sources, and computational techniques. We provide a brief overview of these here.

Note

This section provides general context and can be safely skipped if you just want to fit a model! We first outline our method for \(R_t\) estimation in Section 1.2.

With perfect knowledge we could calculate \(R_t\) by counting the number of secondary cases generated by each primary case. In practice this is impossible, so we must use data to estimate this quantity. Cori and Kucharski (2024) highlight that \(R_t\) can be estimated by multiplying estimates of contact rates and transmission probabilities, or empirically from contact tracing data. However, in practice, we typically use time-series data such as reported case counts to estimate \(R_t\).

Most \(R_t\) estimators are statistical: they seek to estimate \(R_t\) and the associated uncertainty using a statistical model (Steyn and Parag 2024). Purely mathematical methods, i.e. those that do not provide uncertainty estimates, also exist (typically coupling a mathematical model of disease transmission with an optimisation routine to fit the model to data). Robust uncertainty quantification is a key focus of this work, thus our methods fall into the statistical category.

Statistical estimators of \(R_t\) can generally be categorised as either Bayesian (where \(R_t\) and other parameters are treated as random variables with associated prior distributions) or frequentist (where \(R_t\) is treated as a fixed quantity). Bayesian methods are currently more popular than frequentist methods, as they provide a natural way to quantify uncertainty in \(R_t\) estimates (whereas frequentist methods typically rely upon bootstrapping or large-sample arguments), and highly effective simulation-based methods exist for Bayesian methods. SMC (in our context) is an example of a Bayesian simulation-based method.

Bayesian estimators are constructed by assuming a data generating process (inducing a model likelihood) and a prior distribution over the parameters of this process. The likelihood and prior distribution are together termed the model. Alongside the model, a method is required to find the posterior distribution, which can be found analytically (in the case of a conjugate prior and likelihood) or computationally (such as MCMC and ABC). While some methods are better suited to certain models, we encourage a clear separation of the two, a distinction that is often blurred in the literature.

A multitude of data generating processes have been proposed for \(R_t\) estimation, including renewal models, compartmental differential equation models, network models, and agent-based models. Renewal models target \(R_t\) directly, requiring the fewest assumptions about the underlying disease dynamics.

While the renewal model can be employed as is, daily data are typically noisy and incomplete, so some smoothing assumptions must be made. EpiEstim uses a trailing window over which \(R_t\) is assumed to be fixed (Cori et al. 2013), EpiFilter assumes \(R_t\) follows a Gaussian random walk (Parag 2021), while EpiNow2 assumes \(R_t\) follows a Gaussian process (Abbott et al. 2020). In addition to the renewal model itself, the smoothing method also forms part of the statistical model.

Finally, once the model has been set, a method must be chosen to estimate \(R_t\) and other parameters. For example, EpiEstim uses a Gamma-prior distribution for \(R_t\) that is conjugate to the likelihood function, thus producing an analytical posterior distribution. EpiFilter uses a grid-based approximation to the Bayesian filtering and smoothing equations, while EpiNow2 fits an approximation to the assumed Gaussian process using MCMC methods. We use a sequential Monte Carlo approach.