The effects of Brexit#

The aim of this notebook is to estimate the causal impact of Brexit upon the UK’s GDP. This will be done using the synthetic control approach. As such, it is similar to the policy brief “What can we know about the cost of Brexit so far?” [Springford, 2022] from the Center for European Reform. That approach did not use Bayesian estimation methods however.

I did not use the GDP data from the above report however as it had been scaled in some way that was hard for me to understand how it related to the absolute GDP figures. Instead, GDP data was obtained courtesy of Prof. Dooruj Rambaccussing. Raw data is in units of billions of USD.

Warning

This is an experimental and in-progress notebook! While the results are reasonable, there is still some perfecting to be done on the inference side of things. There are high correlations between countries, and the prior for the Dirichlet distribution for country weightings could do with some attention. That said, the results here represent a ‘reasonable’ first approach at this dataset.

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

import causalpy as cp
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'
seed = 42

Load data#

df = (
    cp.load_data("brexit")
    .assign(Time=lambda x: pd.to_datetime(x["Time"]))
    .set_index("Time")
    .loc[lambda x: x.index >= "2009-01-01"]
    # manual exclusion of some countries
    .drop(["Japan", "Italy", "US", "Spain", "Portugal"], axis=1)
)

# specify date of the Brexit vote announcement
treatment_time = pd.to_datetime("2016 June 24")
# get useful country lists
target_country = "UK"
all_countries = df.columns
other_countries = all_countries.difference({target_country})
all_countries = list(all_countries)
other_countries = list(other_countries)

Data visualization#

az.style.use("arviz-white")
# Plot the time series normalised so that intervention point (Q3 2016) is equal to 100
gdp_at_intervention = df.loc[pd.to_datetime("2016 July 01"), :]
df_normalised = (df / gdp_at_intervention) * 100.0

# plot
fig, ax = plt.subplots()
for col in other_countries:
    ax.plot(df_normalised.index, df_normalised[col], color="grey", alpha=0.2)

ax.plot(df_normalised.index, df_normalised[target_country], color="red", lw=3)
# ax = df_normalised.plot(legend=False)

# formatting
ax.set(title="Normalised GDP")
ax.axvline(x=treatment_time, color="r", ls=":");
# Examine how correlated the pre-intervention time series are

pre_intervention_data = df.loc[df.index < treatment_time, :]

corr = pre_intervention_data.corr()

f, ax = plt.subplots(figsize=(8, 6))
ax = sns.heatmap(
    corr,
    mask=np.triu(np.ones_like(corr, dtype=bool)),
    cmap=sns.diverging_palette(230, 20, as_cmap=True),
    vmin=-0.2,
    vmax=1.0,
    center=0,
    square=True,
    linewidths=0.5,
    cbar_kws={"shrink": 0.8},
)
ax.set(title="Correlations for pre-intervention GDP data");

Run the analysis#

Note: The analysis is (and should be) run on the raw GDP data. We do not use the normalised data shown above which was just for ease of visualization.

# build a model formula
formula = target_country + " ~ " + "0 + " + " + ".join(other_countries)

print(formula)
UK ~ 0 + Australia + Austria + Belgium + Canada + Denmark + Finland + France + Germany + Iceland + Luxemburg + Netherlands + New_Zealand + Norway + Sweden + Switzerland

Note

The random_seed keyword argument for the PyMC sampler is not necessary. We use it here so that the results are reproducible.

sample_kwargs = {"tune": 4000, "target_accept": 0.99, "random_seed": seed}

result = cp.SyntheticControl(
    df,
    treatment_time,
    formula=formula,
    model=cp.pymc_models.WeightedSumFitter(sample_kwargs=sample_kwargs),
)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta, sigma]


Sampling 4 chains for 4_000 tune and 1_000 draw iterations (16_000 + 4_000 draws total) took 277 seconds.
Chain 0 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
Chain 1 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
Chain 2 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
Chain 3 reached the maximum tree depth. Increase `max_treedepth`, increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
Sampling: [beta, sigma, y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]

We currently get some divergences, but these are mostly dealt with by increasing tune and target_accept sampling parameters. Nevertheless, the sampling of this dataset/model combination feels a little brittle.

Check the MCMC chain mixing via the Rhat statistic.

az.summary(result.idata, var_names=["~mu"])
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
beta[Australia] 0.119 0.071 0.001 0.239 0.003 0.002 676.0 636.0 1.01
beta[Austria] 0.043 0.040 0.000 0.116 0.001 0.001 932.0 768.0 1.00
beta[Belgium] 0.050 0.046 0.000 0.137 0.001 0.001 790.0 880.0 1.00
beta[Canada] 0.039 0.023 0.000 0.077 0.001 0.001 475.0 605.0 1.01
beta[Denmark] 0.091 0.064 0.000 0.202 0.002 0.002 690.0 569.0 1.01
beta[Finland] 0.041 0.039 0.000 0.112 0.001 0.001 898.0 941.0 1.00
beta[France] 0.030 0.027 0.000 0.078 0.001 0.001 939.0 795.0 1.00
beta[Germany] 0.026 0.025 0.000 0.070 0.001 0.001 689.0 476.0 1.01
beta[Iceland] 0.153 0.041 0.074 0.230 0.001 0.001 958.0 941.0 1.00
beta[Luxemburg] 0.056 0.048 0.000 0.145 0.001 0.001 1073.0 1561.0 1.00
beta[Netherlands] 0.047 0.044 0.000 0.130 0.001 0.001 745.0 581.0 1.00
beta[New_Zealand] 0.059 0.053 0.000 0.160 0.002 0.001 573.0 315.0 1.01
beta[Norway] 0.080 0.045 0.001 0.152 0.002 0.001 432.0 389.0 1.01
beta[Sweden] 0.099 0.031 0.043 0.162 0.001 0.001 516.0 312.0 1.01
beta[Switzerland] 0.065 0.057 0.000 0.169 0.001 0.001 4589.0 2511.0 1.00
sigma 0.031 0.005 0.022 0.041 0.000 0.000 710.0 691.0 1.01

You can inspect the traces in more detail with:

az.plot_trace(result.idata, var_names="~mu", compact=False);
az.style.use("arviz-darkgrid")

fig, ax = result.plot(plot_predictors=False)

for i in [0, 1, 2]:
    ax[i].set(ylabel="Billion USD")
result.summary()
==================================Pre-Post Fit==================================
Formula: UK ~ 0 + Australia + Austria + Belgium + Canada + Denmark + Finland + France + Germany + Iceland + Luxemburg + Netherlands + New_Zealand + Norway + Sweden + Switzerland
Model coefficients:
    Australia    0.12, 94% HDI [0.012, 0.27]
    Austria      0.043, 94% HDI [0.0013, 0.14]
    Belgium      0.05, 94% HDI [0.0016, 0.16]
    Canada       0.039, 94% HDI [0.0035, 0.086]
    Denmark      0.091, 94% HDI [0.0044, 0.23]
    Finland      0.041, 94% HDI [0.0015, 0.13]
    France       0.03, 94% HDI [0.0011, 0.097]
    Germany      0.026, 94% HDI [0.00097, 0.089]
    Iceland      0.15, 94% HDI [0.076, 0.23]
    Luxemburg    0.056, 94% HDI [0.0032, 0.17]
    Netherlands  0.047, 94% HDI [0.0013, 0.15]
    New_Zealand  0.059, 94% HDI [0.0013, 0.19]
    Norway       0.08, 94% HDI [0.0068, 0.17]
    Sweden       0.099, 94% HDI [0.036, 0.16]
    Switzerland  0.065, 94% HDI [0.0027, 0.2]
    sigma        0.031, 94% HDI [0.023, 0.042]
ax = az.plot_forest(result.idata, var_names="beta", figsize=(6, 5))
ax[0].set(title="Estimated weighting coefficients");

References#

[1]

John Springford. What can we know about the cost of brexit so far? 2022. URL: https://www.cer.eu/publications/archive/policy-brief/2022/cost-brexit-so-far.