Temporally aggregated incidence data (such as weekly instead of daily reported cases) can make \(R_t\) estimation using tools like EpiEstim and EpiFilter difficult, particularly when the duration of the aggregation period exceeds the scale at which transmission occurs. The fundamental issue is methods like EpiEstim and EpiFilter assume that reported cases at the current time-step are caused only by reported cases on previous time-steps, whereas in aggregated data, we must also account for within-period transmission.
Existing approaches to estimating \(R_t\) from temporally aggregated data all reconstruct finer-grained incidence data from the aggregated data, before applying more standard methods. For example, Nash et al. (2023) use an Expectation-Maximisation algorithm to infer daily incidence and then apply an EpiEstim-type method, while I. Ogi-Gittins et al. (2024) pair an exact form of approximate Bayesian computation with a simulation based approach to infer sub-daily incidence. One method that does not require substantial adaption is EpiNow2 Abbott et al. (2020), which models the latent infections already.
Given an appropriate inference method, there are also cases in which it makes sense to model temporally aggregated data. If day-of-the-week effects are present, large, and inconsistent, then modelling weekly-aggregated data can decrease observation variance. From a practical modelling perspective, in the SMC context, aggregated data feature fewer observations, thus requiring fewer particles to estimate the model log-likelihood with the same precision.
10.1 Modelling temporally aggregated data
We can use any model which separates latent infections \(I_t\) from reported cases \(C_t\), for example, the model introduced in ?sec-models-obsnoise-obsnoisenounder (where subscript \(t\) still denotes time in days):
Algorithmically, we just skip the weighting and resampling steps on the time steps for which we have no observations.
10.1.1 Example: Influenza in Wales
Borrowing the example data and generation time distribution from I. Ogi-Gittins et al. (2024), we estimate \(R_t\) in Wales over two influenza seaons (2019-2020 and 2022-2023). The data are reported as weekly case counts, which as above, we assume reflect all reported cases over the preceding week. We edit the bootstrap filter to only weight and resample on days on which data are observed:
for tt =2:T# Project according to the state-space model I[:,tt] =...# Weight according to the observation model,# ... but only on the days that we observe dataif tt %7==0# Fetch expected reported cases (sum over the preceding 7 days) μt =sum(I[:,(tt-6):1:tt], dims=2)# Calculate weights W[:,tt] =...# Some function of expected reported cases# Resample...endend
Fitting the model and plotting the estiamtes of \(R_t\) and daily incidence \(I_t\) produces:
Code
usingPlots, Measuresinclude("../src/LoadData.jl")include("../src/PMMH.jl")include("../src/FitModel.jl")include("../src/MarginalPosterior.jl")include("../src/Support.jl")functiontemporallyAggregatedModel(θ, Y::DataFrame, opts::Dict)# Extract frequently used options T = opts["T"] N = opts["N"] L = opts["L"] ω = opts["ω"]# Initialise output matrices R =zeros(N, T) I =zeros(N, T) W =ones(N, T)# Sample from initial distributions R[:,1] =rand(opts["pR0"], N) I[:,1] =rand(opts["pI0"], N)# Run the filterfor tt =2:T# Project according to the state-space model R[:,tt] =exp.(rand.(Normal.(log.(R[:,tt-1]), θ[1]))) Λ =sum(I[:, (tt-1):-1:1] .* ω[1:(tt-1)]', dims=2) ./ sum(ω[1:(tt-1)]) I[:,tt] =rand.(Poisson.(R[:,tt] .* Λ))# Weight according to the observation model, but only on the day that we observe dataif tt %7==0# Fetch expected reported cases μt =sum(I[:,(tt-6):1:tt], dims=2)# Calculate weights r =max.(μt, 1)/θ[2] p =1/ (1+ θ[2]) W[:,tt] =pdf.(NegativeBinomial.(r, p), Y.Ct[tt])# Resample inds =wsample(1:N, W[:,tt], N; replace=true) R[:, max(tt - L, 1):tt] = R[inds, max(tt - L, 1):tt] I[:, max(tt - L, 1):tt] = I[inds, max(tt - L, 1):tt]endend# Store output as three-dimensional array X =zeros(N, T, 2) X[:,:,1] = R X[:,:,2] = I# Forecastreturn(X, W)end# Specify generation time distribution parametersgamma_mean =2.6gamma_sd =1.3gamma_shape = gamma_mean^2/ gamma_sd^2gamma_scale = gamma_sd^2/ gamma_meanopts =Dict(# Bootstrap filter options"T"=>missing, # Number of time-steps"N"=>1000, # Number of particles"L"=>50, # Fixed-lag resampling length"pR0"=>Uniform(0.5, 3), # Prior on Rt at t = 0"pI0"=>DiscreteUniform(1, 5), # Prior on It at t = 0# Generation time distribution"ω"=>pdf.(Gamma(gamma_shape, gamma_scale), 1:100) /sum(pdf.(Gamma(gamma_shape, gamma_scale), 1:100)), # Serial interval: Gamma with mean 2.6 and sd 1.3 days# PMMH options"nChains"=>3,"chunkSize"=>100,"maxChunks"=>50,"maxRhat"=>1.05,"minESS"=>100,"showChunkProgress"=>false,"propStdDevInit"=> [0.05, 0.5],"paramPriors"=> [Uniform(0, 1), Uniform(0, 10)],"initialParamSamplers"=> [Uniform(0.1, 0.3), Uniform(4, 6)],"paramNames"=> ["σ", "k"],# Marginal posterior options"posteriorNumberOfParticles"=>5000,"posteriorParamSamples"=>100,"stateNames"=> ["Rt", "It"])# Load dataY1 =loadData("Ogi-Gittins-201920");opts["T"] =size(Y1)[1]opts["pI0"] =DiscreteUniform(5, 10)# Fit the model(df_states1, df_params1, θ1, diag1) =fitModel(temporallyAggregatedModel, Y1, opts; verbose=false)df_states1.Date= [Y1.Date; Y1.Date]# Fit the second set of dataY2 =loadData("Ogi-Gittins-202223");opts["T"] =size(Y2)[1]opts["pI0"] =DiscreteUniform(20, 40)# Fit the model to the second set of data(df_states2, df_params2, θ2, diag2) =fitModel(temporallyAggregatedModel, Y2, opts; verbose=false)df_states2.Date= [Y2.Date; Y2.Date]# Plot the data and the resultsdf_R_1 = df_states1[df_states1.variable .=="Rt", :]df_I_1 = df_states1[df_states1.variable .=="It", :]df_R_2 = df_states2[df_states2.variable .=="Rt", :]df_I_2 = df_states2[df_states2.variable .=="It", :]plot_R_1 =plot(df_R_1.Date, df_R_1.mean, ribbon=(df_R_1.mean .- df_R_1.lower, df_R_1.upper .- df_R_1.mean), xlabel="Date", ylabel="Reproduction number", legend=false)plot_I_1 =plot(df_I_1.Date, df_I_1.mean, ribbon=(df_I_1.mean .- df_I_1.lower, df_I_1.upper .- df_I_1.mean), xlabel="Date", ylabel="Infection incidence", legend=false)plot_R_2 =plot(df_R_2.Date, df_R_2.mean, ribbon=(df_R_2.mean .- df_R_2.lower, df_R_2.upper .- df_R_2.mean), xlabel="Date", ylabel="Reproduction number", legend=false)plot_I_2 =plot(df_I_2.Date, df_I_2.mean, ribbon=(df_I_2.mean .- df_I_2.lower, df_I_2.upper .- df_I_2.mean), xlabel="Date", ylabel="Infection incidence", legend=false)plot_C_1 =bar(Y1.Date, Y1.Ct, xlabel="Date", ylabel="Weekly reported cases", legend=false, bar_width=3)plot_C_2 =bar(Y2.Date, Y2.Ct, xlabel="Date", ylabel="Weekly reported cases", legend=false, bar_width=3)plot_all_1 =plot(plot_R_1, plot_I_1, plot_C_1, layout=(3, 1), size=(800, 800))plot_all_2 =plot(plot_R_2, plot_I_2, plot_C_2, layout=(3, 1), size=(800, 800))p =plot(plot_all_1, plot_all_2, layout=(1, 2), size=(1600, 800), titlefontsize=16, margins=6mm)p =plot!(p[1], title="Influenza in Wales, 2019-20")p =plot!(p[4], title="Influenza in Wales, 2022-23")display(p)
10.2 Concluding remarks
The approach highlighted above can be viewed as a generalisation of the two Ogi-Gittins methods for modelling temporally aggregated data. If \(R_t\) is assumed to be weekly-constant and independent, and there is assumed to be no observation noise (i.e. the observation distribution for total infection incidence is dirac at the number of observed cases), then the model of (I. Ogi-Gittins et al. 2024) is recovered. Alternatively, if binomial underreporting is assumed, then the model of (Isaac Ogi-Gittins et al. 2025) is recovered.
We also highlight the flexibility of this method: the observation model can be modified in a variety of ways. The weekly reporting window is arbitrary, and even irregular reporting windows can be implemented. Furthermore, the daily time steps are also arbitrary, and a finer discretisation could be used (for infectious diseases with very short serial intervals) or a coarser discretiation could be used (for infectious diseases with long serial intervals, to save computation time).
Abbott, Sam, Joel Hellewell, Robin N. Thompson, Katharine Sherratt, Hamish P. Gibbs, Nikos I. Bosse, James D. Munday, et al. 2020. “Estimating the Time-Varying Reproduction Number of SARS-CoV-2 Using National and Subnational Case Counts.”Wellcome Open Research 5 (December): 112. https://doi.org/10.12688/wellcomeopenres.16006.2.
Nash, Rebecca K., Samir Bhatt, Anne Cori, and Pierre Nouvellet. 2023. “Estimating the Epidemic Reproduction Number from Temporally Aggregated Incidence Data: A Statistical Modelling Approach and Software Tool.” Edited by Eric Hy Lau. PLOS Computational Biology 19 (8): e1011439. https://doi.org/10.1371/journal.pcbi.1011439.
Ogi-Gittins, I., W. S. Hart, J. Song, R. K. Nash, J. Polonsky, A. Cori, E. M. Hill, and R. N. Thompson. 2024. “A Simulation-Based Approach for Estimating the Time-Dependent Reproduction Number from Temporally Aggregated Disease Incidence Time Series Data.”Epidemics 47: 100773. https://doi.org/10.1016/j.epidem.2024.100773.
Ogi-Gittins, Isaac, Nicholas Steyn, Jonathan Polonsky, William S. Hart, Mory Keita, Steve Ahuka-Mundeke, Edward M. Hill, and Robin N. Thompson. 2025. “Simulation-Based Inference of the Time-Dependent Reproduction Number from Temporally Aggregated and Under-Reported Disease Incidence Time Series Data.”Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 383 (2293): 20240412. https://doi.org/10.1098/rsta.2024.0412.