10 Temporally aggregated data

Temporally aggregated incidence data (such as weekly instead of daily reported cases) can make \(R_t\) estimation using tools like EpiEstim and EpiFilter difficult, particularly when the duration of the aggregation period exceeds the scale at which transmission occurs. The fundamental issue is methods like EpiEstim and EpiFilter assume that reported cases at the current time-step are caused only by reported cases on previous time-steps, whereas in aggregated data, we must also account for within-period transmission.

Existing approaches to estimating \(R_t\) from temporally aggregated data all reconstruct finer-grained incidence data from the aggregated data, before applying more standard methods. For example, Nash et al. (2023) use an Expectation-Maximisation algorithm to infer daily incidence and then apply an EpiEstim-type method, while I. Ogi-Gittins et al. (2024) pair an exact form of approximate Bayesian computation with a simulation based approach to infer sub-daily incidence. One method that does not require substantial adaption is EpiNow2 Abbott et al. (2020), which models the latent infections already.

Given an appropriate inference method, there are also cases in which it makes sense to model temporally aggregated data. If day-of-the-week effects are present, large, and inconsistent, then modelling weekly-aggregated data can decrease observation variance. From a practical modelling perspective, in the SMC context, aggregated data feature fewer observations, thus requiring fewer particles to estimate the model log-likelihood with the same precision.

10.1 Modelling temporally aggregated data

We can use any model which separates latent infections \(I_t\) from reported cases \(C_t\), for example, the model introduced in ?sec-models-obsnoise-obsnoisenounder (where subscript \(t\) still denotes time in days):

\[ \begin{align} \log R_t | \log R_{t-1} &\sim \text{Normal}(\log R_{t-1}, \sigma) \\ I_t | R_t, I_{1:t-1} &\sim \text{Poisson}(R_t \Lambda_t) \end{align} \tag{10.1}\]

where the observation model is modified slightly to reflect the assumed aggregation (using weekly as an example):

\[ C_t | I_{1:t} \sim \text{Negative binomial}\left(r = \frac{\sum_{s=0}^6 I_{t-s}}{k}, p = \frac{1}{1+k} \right), \ \ \text{if} \ \text{mod}(t, 7) = 0 \]

Algorithmically, we just skip the weighting and resampling steps on the time steps for which we have no observations.

10.1.1 Example: Influenza in Wales

Borrowing the example data and generation time distribution from I. Ogi-Gittins et al. (2024), we estimate \(R_t\) in Wales over two influenza seaons (2019-2020 and 2022-2023). The data are reported as weekly case counts, which as above, we assume reflect all reported cases over the preceding week. We edit the bootstrap filter to only weight and resample on days on which data are observed:

for tt = 2:T
        
    # Project according to the state-space model
    I[:,tt] = ...
        
    # Weight according to the observation model,
    # ... but only on the days that we observe data
    if tt % 7 == 0
            
        # Fetch expected reported cases (sum over the preceding 7 days)
        μt = sum(I[:,(tt-6):1:tt], dims=2)
            
        # Calculate weights
        W[:,tt] = ... # Some function of expected reported cases
            
        # Resample
        ...
        
    end
        
end

Fitting the model and plotting the estiamtes of \(R_t\) and daily incidence \(I_t\) produces:

Code

using Plots, Measures

include("../src/LoadData.jl")
include("../src/PMMH.jl")
include("../src/FitModel.jl")
include("../src/MarginalPosterior.jl")
include("../src/Support.jl")

function temporallyAggregatedModel(θ, Y::DataFrame, opts::Dict)
    
    # Extract frequently used options
    T = opts["T"]
    N = opts["N"]
    L = opts["L"]
    ω = opts["ω"]
    
    # Initialise output matrices
    R = zeros(N, T)
    I = zeros(N, T)
    W = ones(N, T)
        
    # Sample from initial distributions
    R[:,1] = rand(opts["pR0"], N)
    I[:,1] = rand(opts["pI0"], N)
    
    # Run the filter
    for tt = 2:T
        
        # Project according to the state-space model
        R[:,tt] = exp.(rand.(Normal.(log.(R[:,tt-1]), θ[1])))
        Λ = sum(I[:, (tt-1):-1:1] .* ω[1:(tt-1)]', dims=2) ./ sum(ω[1:(tt-1)])
        I[:,tt] = rand.(Poisson.(R[:,tt] .* Λ))
        
        # Weight according to the observation model, but only on the day that we observe data
        if tt % 7 == 0
            
            # Fetch expected reported cases
            μt = sum(I[:,(tt-6):1:tt], dims=2)
            
            # Calculate weights
            r = max.(μt, 1)/θ[2]
            p = 1 / (1 + θ[2])
            W[:,tt] = pdf.(NegativeBinomial.(r, p), Y.Ct[tt])
            
            # Resample
            inds = wsample(1:N, W[:,tt], N; replace=true)
            R[:, max(tt - L, 1):tt] = R[inds, max(tt - L, 1):tt]
            I[:, max(tt - L, 1):tt] = I[inds, max(tt - L, 1):tt]
            
        end
        
    end
    
    # Store output as three-dimensional array
    X = zeros(N, T, 2)
    X[:,:,1] = R
    X[:,:,2] = I
    
    # Forecast
    return(X, W)
    
end


# Specify generation time distribution parameters
gamma_mean = 2.6
gamma_sd = 1.3
gamma_shape = gamma_mean^2 / gamma_sd^2
gamma_scale = gamma_sd^2 / gamma_mean

opts = Dict(

    # Bootstrap filter options
    "T" => missing, # Number of time-steps
    "N" => 1000, # Number of particles
    "L" => 50, # Fixed-lag resampling length
    "pR0" => Uniform(0.5, 3), # Prior on Rt at t = 0
    "pI0" => DiscreteUniform(1, 5), # Prior on It at t = 0

    # Generation time distribution
    "ω" => pdf.(Gamma(gamma_shape, gamma_scale), 1:100) / sum(pdf.(Gamma(gamma_shape, gamma_scale), 1:100)), # Serial interval: Gamma with mean 2.6 and sd 1.3 days

    # PMMH options
    "nChains" => 3,
    "chunkSize" => 100,
    "maxChunks" => 50,
    "maxRhat" => 1.05,
    "minESS" => 100,
    "showChunkProgress" => false,
    "propStdDevInit" => [0.05, 0.5],
    "paramPriors" => [Uniform(0, 1), Uniform(0, 10)],
    "initialParamSamplers" => [Uniform(0.1, 0.3), Uniform(4, 6)],
    "paramNames" => ["σ", "k"],

    # Marginal posterior options
    "posteriorNumberOfParticles" => 5000,
    "posteriorParamSamples" => 100,
    "stateNames" => ["Rt", "It"]

)


# Load data
Y1 = loadData("Ogi-Gittins-201920");
opts["T"] = size(Y1)[1]
opts["pI0"] = DiscreteUniform(5, 10)

# Fit the model
(df_states1, df_params1, θ1, diag1) = fitModel(temporallyAggregatedModel, Y1, opts; verbose=false)
df_states1.Date = [Y1.Date; Y1.Date]

# Fit the second set of data
Y2 = loadData("Ogi-Gittins-202223");
opts["T"] = size(Y2)[1]
opts["pI0"] = DiscreteUniform(20, 40)

# Fit the model to the second set of data
(df_states2, df_params2, θ2, diag2) = fitModel(temporallyAggregatedModel, Y2, opts; verbose=false)
df_states2.Date = [Y2.Date; Y2.Date]

# Plot the data and the results
df_R_1 = df_states1[df_states1.variable .== "Rt", :]
df_I_1 = df_states1[df_states1.variable .== "It", :]
df_R_2 = df_states2[df_states2.variable .== "Rt", :]
df_I_2 = df_states2[df_states2.variable .== "It", :]

plot_R_1 = plot(df_R_1.Date, df_R_1.mean, ribbon=(df_R_1.mean .- df_R_1.lower, df_R_1.upper .- df_R_1.mean), xlabel="Date", ylabel="Reproduction number", legend=false)
plot_I_1 = plot(df_I_1.Date, df_I_1.mean, ribbon=(df_I_1.mean .- df_I_1.lower, df_I_1.upper .- df_I_1.mean), xlabel="Date", ylabel="Infection incidence", legend=false)
plot_R_2 = plot(df_R_2.Date, df_R_2.mean, ribbon=(df_R_2.mean .- df_R_2.lower, df_R_2.upper .- df_R_2.mean), xlabel="Date", ylabel="Reproduction number", legend=false)
plot_I_2 = plot(df_I_2.Date, df_I_2.mean, ribbon=(df_I_2.mean .- df_I_2.lower, df_I_2.upper .- df_I_2.mean), xlabel="Date", ylabel="Infection incidence", legend=false)
plot_C_1 = bar(Y1.Date, Y1.Ct, xlabel="Date", ylabel="Weekly reported cases", legend=false, bar_width=3)
plot_C_2 = bar(Y2.Date, Y2.Ct, xlabel="Date", ylabel="Weekly reported cases", legend=false, bar_width=3)

plot_all_1 = plot(plot_R_1, plot_I_1, plot_C_1, layout=(3, 1), size=(800, 800))
plot_all_2 = plot(plot_R_2, plot_I_2, plot_C_2, layout=(3, 1), size=(800, 800))
p = plot(plot_all_1, plot_all_2, layout=(1, 2), size=(1600, 800), titlefontsize=16, margins=6mm)
p = plot!(p[1], title="Influenza in Wales, 2019-20")
p = plot!(p[4], title="Influenza in Wales, 2022-23")
display(p)

10.2 Concluding remarks

The approach highlighted above can be viewed as a generalisation of the two Ogi-Gittins methods for modelling temporally aggregated data. If \(R_t\) is assumed to be weekly-constant and independent, and there is assumed to be no observation noise (i.e. the observation distribution for total infection incidence is dirac at the number of observed cases), then the model of (I. Ogi-Gittins et al. 2024) is recovered. Alternatively, if binomial underreporting is assumed, then the model of (Isaac Ogi-Gittins et al. 2025) is recovered.

We also highlight the flexibility of this method: the observation model can be modified in a variety of ways. The weekly reporting window is arbitrary, and even irregular reporting windows can be implemented. Furthermore, the daily time steps are also arbitrary, and a finer discretisation could be used (for infectious diseases with very short serial intervals) or a coarser discretiation could be used (for infectious diseases with long serial intervals, to save computation time).