Synthetic control with sci-kit learn models#

from sklearn.linear_model import LinearRegression

import causalpy as cp

Load data#

df = cp.load_data("sc")
treatment_time = 70

Analyse with WeightedProportion model#

# Note, we do not want an intercept in this model
result = cp.SyntheticControl(
    df,
    treatment_time,
    formula="actual ~ 0 + a + b + c + d + e + f + g",
    model=cp.skl_models.WeightedProportion(),
)
fig, ax = result.plot(plot_predictors=True)
/Users/benjamv/opt/mambaforge/envs/CausalPy/lib/python3.11/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered.
  warnings.warn(
/Users/benjamv/opt/mambaforge/envs/CausalPy/lib/python3.11/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered.
  warnings.warn(
/Users/benjamv/opt/mambaforge/envs/CausalPy/lib/python3.11/site-packages/IPython/core/pylabtools.py:77: DeprecationWarning: backend2gui is deprecated since IPython 8.24, backends are managed in matplotlib and can be externally registered.
  warnings.warn(
../_images/2e143019de97ebe2b5055320ac146d0338208c8254633c4aed30d0d8a09cb900.png
result.summary(round_to=3)
==================================Pre-Post Fit==================================
Formula: actual ~ 0 + a + b + c + d + e + f + g
Model coefficients:
  a	     0.385
  b	     0.172
  c	     0.443
  d	         0
  e	  5.39e-18
  f	         0
  g	         0

But we can see that (for this dataset) these estimates are quite bad. So we can lift the “sum to 1” assumption and instead use the LinearRegression model, but still constrain weights to be positive. Equally, you could experiment with the Ridge model (e.g. Ridge(positive=True, alpha=100)).

Analyse with the LinearRegression model#

# Note, we do not want an intercept in this model
result = cp.SyntheticControl(
    df,
    treatment_time,
    formula="actual ~ 0 + a + b + c + d + e + f + g",
    model=LinearRegression(positive=True),
)
fig, ax = result.plot(plot_predictors=True)
../_images/e76dbc47991908b6ec1e3812ec3439e352c8989d56bbf134efbe838c9b332603.png
result.summary(round_to=3)
==================================Pre-Post Fit==================================
Formula: actual ~ 0 + a + b + c + d + e + f + g
Model coefficients:
  a	     0.322
  b	    0.0581
  c	     0.288
  d	    0.0561
  e	   0.00418
  f	     0.229
  g	    0.0378