Excess deaths due to COVID-19

import pandas as pd

import causalpy as cp

WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'
seed = 42

Load data

df = (
    cp.load_data("covid")
    .assign(date=lambda x: pd.to_datetime(x["date"]))
    .set_index("date")
)

treatment_time = pd.to_datetime("2020-01-01")
df.head()

	temp	deaths	year	month	t	pre
date
2006-01-01	3.8	49124	2006	1	0	True
2006-02-01	3.4	42664	2006	2	1	True
2006-03-01	3.9	49207	2006	3	2	True
2006-04-01	7.4	40645	2006	4	3	True
2006-05-01	10.7	42425	2006	5	4	True

The columns are:

date + year: self explanatory
month: month, numerically encoded. Needs to be treated as a categorical variable
temp: average UK temperature (Celcius)
t: time
pre: boolean flag indicating pre or post intervention

Run the analysis

In this example we are going to standardize the data. So we have to be careful in how we interpret the inferred regression coefficients, and the posterior predictions will be in this standardized space.

Note

The random_seed keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.

result = cp.pymc_experiments.InterruptedTimeSeries(
    df,
    treatment_time,
    formula="standardize(deaths) ~ 0 + standardize(t) + C(month) + standardize(temp)",
    model=cp.pymc_models.LinearRegression(sample_kwargs={"random_seed": seed}),
)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta, sigma]

100.00% [8000/8000 00:01<00:00 Sampling 4 chains, 0 divergences]

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.
Sampling: [beta, sigma, y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]

fig, ax = result.plot()

../_images/88934bc098ab8dbf10f92b969cada1e0ca98e26fc78ba77fbf3be3d0a251b004.png

result.summary()

==================================Pre-Post Fit==================================
Formula: standardize(deaths) ~ 0 + standardize(t) + C(month) + standardize(temp)
Model coefficients:
C(month)[1]                   1.6, 94% HDI [1.1, 2]
C(month)[2]                   -0.2, 94% HDI [-0.65, 0.27]
C(month)[3]                   0.27, 94% HDI [-0.11, 0.64]
C(month)[4]                   -0.037, 94% HDI [-0.3, 0.25]
C(month)[5]                   -0.15, 94% HDI [-0.46, 0.15]
C(month)[6]                   -0.21, 94% HDI [-0.62, 0.19]
C(month)[7]                   -0.024, 94% HDI [-0.55, 0.5]
C(month)[8]                   -0.42, 94% HDI [-0.9, 0.066]
C(month)[9]                   -0.44, 94% HDI [-0.84, -0.051]
C(month)[10]                  -0.061, 94% HDI [-0.35, 0.23]
C(month)[11]                  -0.36, 94% HDI [-0.71, -0.0094]
C(month)[12]                  0.073, 94% HDI [-0.36, 0.52]
standardize(t)                0.23, 94% HDI [0.15, 0.31]
standardize(temp)             -0.44, 94% HDI [-0.75, -0.14]
sigma                         0.55, 94% HDI [0.5, 0.62]