The effects of Brexit

The aim of this notebook is to estimate the causal impact of Brexit upon the UK’s GDP. This will be done using the synthetic control approch. As such, it is similar to the policy brief “What can we know about the cost of Brexit so far?” [Springford, 2022] from the Center for European Reform. That approach did not use Bayesian estimation methods however.

I did not use the GDP data from the above report however as it had been scaled in some way that was hard for me to understand how it related to the absolute GDP figures. Instead, GDP data was obtained courtesy of Prof. Dooruj Rambaccussing. Raw data is in units of billions of USD.

Warning

This is an experimental and in-progress notebook! While the results are reasonable, there is still some perfecting to be done on the inference side of things. There are high correlations between countries, and the prior for the Dirichlet distribution for country weightings could do with some attention. That said, the results here represent a ‘reasonable’ first approach at this dataset.

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

import causalpy as cp
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'
seed = 42

Load data

df = (
    cp.load_data("brexit")
    .assign(Time=lambda x: pd.to_datetime(x["Time"]))
    .set_index("Time")
    .loc[lambda x: x.index >= "2009-01-01"]
    # manual exclusion of some countries
    .drop(["Japan", "Italy", "US", "Spain", "Portugal"], axis=1)
)

# specify date of the Brexit vote announcement
treatment_time = pd.to_datetime("2016 June 24")
# get useful country lists
target_country = "UK"
all_countries = df.columns
other_countries = all_countries.difference({target_country})
all_countries = list(all_countries)
other_countries = list(other_countries)

Data visualization

az.style.use("arviz-white")
# Plot the time series normalised so that intervention point (Q3 2016) is equal to 100
gdp_at_intervention = df.loc[pd.to_datetime("2016 July 01"), :]
df_normalised = (df / gdp_at_intervention) * 100.0

# plot
fig, ax = plt.subplots()
for col in other_countries:
    ax.plot(df_normalised.index, df_normalised[col], color="grey", alpha=0.2)

ax.plot(df_normalised.index, df_normalised[target_country], color="red", lw=3)
# ax = df_normalised.plot(legend=False)

# formatting
ax.set(title="Normalised GDP")
ax.axvline(x=treatment_time, color="r", ls=":");
../_images/ce5b77a49d31baa5f9b2508d6cf4ae2efa60c4c2d235101d30249fa296bf4907.png
# Examine how correlated the pre-intervention time series are

pre_intervention_data = df.loc[df.index < treatment_time, :]

corr = pre_intervention_data.corr()

f, ax = plt.subplots(figsize=(8, 6))
ax = sns.heatmap(
    corr,
    mask=np.triu(np.ones_like(corr, dtype=bool)),
    cmap=sns.diverging_palette(230, 20, as_cmap=True),
    vmin=-0.2,
    vmax=1.0,
    center=0,
    square=True,
    linewidths=0.5,
    cbar_kws={"shrink": 0.8},
)
ax.set(title="Correlations for pre-intervention GDP data");
../_images/266bf115ae29827aae0c95ffd9d94a100e0b66f5d373128ca9c93ac5a8860e4f.png

Run the analysis

Note: The analysis is (and should be) run on the raw GDP data. We do not use the normalised data shown above which was just for ease of visualization.

# build a model formula
formula = target_country + " ~ " + "0 + " + " + ".join(other_countries)

print(formula)
UK ~ 0 + Australia + Austria + Belgium + Canada + Denmark + Finland + France + Germany + Iceland + Luxemburg + Netherlands + New_Zealand + Norway + Sweden + Switzerland

Note

The random_seed keyword argument for the PyMC sampler is not neccessary. We use it here so that the results are reproducible.

sample_kwargs = {"tune": 4000, "target_accept": 0.99, "random_seed": seed}

result = cp.pymc_experiments.SyntheticControl(
    df,
    treatment_time,
    formula=formula,
    model=cp.pymc_models.WeightedSumFitter(sample_kwargs=sample_kwargs),
)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta, sigma]
100.00% [20000/20000 04:50<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 4_000 tune and 1_000 draw iterations (16_000 + 4_000 draws total) took 291 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
Chain 0 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
Chain 1 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
Chain 2 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
Chain 3 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
Sampling: [beta, sigma, y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]

We currently get some divergences, but these are mostly dealt with by increasing tune and target_accept sampling parameters. Nevertheless, the sampling of this dataset/model combination feels a little brittle.

Check the MCMC chain mixing via the Rhat statistic.

az.summary(result.idata, var_names=["~mu"])
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
beta[Australia] 0.122 0.074 0.001 0.245 0.003 0.002 547.0 584.0 1.00
beta[Austria] 0.042 0.039 0.000 0.113 0.002 0.001 391.0 401.0 1.00
beta[Belgium] 0.050 0.043 0.000 0.128 0.001 0.001 722.0 646.0 1.00
beta[Canada] 0.040 0.023 0.000 0.079 0.001 0.001 321.0 426.0 1.00
beta[Denmark] 0.092 0.062 0.000 0.201 0.002 0.002 504.0 517.0 1.01
beta[Finland] 0.043 0.037 0.000 0.114 0.001 0.001 898.0 784.0 1.00
beta[France] 0.030 0.027 0.000 0.080 0.001 0.001 705.0 730.0 1.00
beta[Germany] 0.026 0.025 0.000 0.071 0.001 0.001 612.0 808.0 1.01
beta[Iceland] 0.152 0.041 0.075 0.231 0.002 0.002 396.0 232.0 1.01
beta[Luxemburg] 0.052 0.046 0.000 0.138 0.001 0.001 671.0 352.0 1.00
beta[Netherlands] 0.050 0.044 0.000 0.131 0.001 0.001 740.0 782.0 1.01
beta[New_Zealand] 0.061 0.054 0.000 0.160 0.002 0.001 425.0 319.0 1.01
beta[Norway] 0.079 0.044 0.000 0.151 0.002 0.001 422.0 298.0 1.01
beta[Sweden] 0.096 0.031 0.039 0.157 0.002 0.001 403.0 259.0 1.02
beta[Switzerland] 0.064 0.054 0.000 0.166 0.001 0.001 3850.0 2342.0 1.00
sigma 0.031 0.005 0.022 0.040 0.000 0.000 557.0 619.0 1.00

You can inspect the traces in more detail with:

az.plot_trace(result.idata, var_names="~mu", compact=False);
az.style.use("arviz-darkgrid")

fig, ax = result.plot(plot_predictors=False)

for i in [0, 1, 2]:
    ax[i].set(ylabel="Billion USD")
../_images/37be7b4f478d5ab48da90c29576e3b64a8b05a4353bbfbbedec25656f45fa001.png
result.summary()
==================================Pre-Post Fit==================================
Formula: UK ~ 0 + Australia + Austria + Belgium + Canada + Denmark + Finland + France + Germany + Iceland + Luxemburg + Netherlands + New_Zealand + Norway + Sweden + Switzerland
Model coefficients:
Australia                     0.12, 94% HDI [0.0095, 0.27]
Austria                       0.042, 94% HDI [0.0016, 0.14]
Belgium                       0.05, 94% HDI [0.0019, 0.15]
Canada                        0.04, 94% HDI [0.0035, 0.087]
Denmark                       0.092, 94% HDI [0.0069, 0.23]
Finland                       0.043, 94% HDI [0.0022, 0.14]
France                        0.03, 94% HDI [0.0012, 0.098]
Germany                       0.026, 94% HDI [0.001, 0.089]
Iceland                       0.15, 94% HDI [0.067, 0.23]
Luxemburg                     0.052, 94% HDI [0.0016, 0.16]
Netherlands                   0.05, 94% HDI [0.0024, 0.16]
New_Zealand                   0.061, 94% HDI [0.0013, 0.19]
Norway                        0.079, 94% HDI [0.0059, 0.17]
Sweden                        0.096, 94% HDI [0.034, 0.15]
Switzerland                   0.064, 94% HDI [0.0027, 0.2]
sigma                         0.031, 94% HDI [0.023, 0.041]
ax = az.plot_forest(result.idata, var_names="beta", figsize=(6, 5))
ax[0].set(title="Estimated weighting coefficients");
../_images/16d8d7dfc20f18b480a99e52603908209c584e7083ef640f5a8194584b10ea63.png

References

[1]

John Springford. What can we know about the cost of brexit so far? 2022. URL: https://www.cer.eu/publications/archive/policy-brief/2022/cost-brexit-so-far.