causalpy.skl_experiments

Experiments for Scikit-Learn models

  • ExperimentalDesign: base class for scikit-learn experiments

  • PrePostFit: base class for synthetic control and interrupted time series

  • SyntheticControl

  • InterruptedTimeSeries

  • DifferenceInDifferences

  • RegressionDiscontinuity

class causalpy.skl_experiments.DifferenceInDifferences

Note

There is no pre/post intervention data distinction for DiD, we fit all the data available.

Parameters:
  • data (DataFrame) – A pandas data frame

  • formula (str) – A statistical model formula

  • time_variable_name (str) – Name of the data column for the time variable

  • group_variable_name (str) – Name of the data column for the group variable

  • model – An scikit-learn model for difference in differences

Example

>>> import causalpy as cp
>>> from sklearn.linear_model import LinearRegression
>>> df = cp.load_data("did")
>>> result = cp.skl_experiments.DifferenceInDifferences(
...     df,
...     formula="y ~ 1 + group*post_treatment",
...     time_variable_name="t",
...     group_variable_name="group",
...     treated=1,
...     untreated=0,
...     model=LinearRegression(),
... )
__init__(data, formula, time_variable_name, group_variable_name, treated, untreated, model=None, **kwargs)
Parameters:
  • data (DataFrame)

  • formula (str)

  • time_variable_name (str)

  • group_variable_name (str)

  • treated (str)

  • untreated (str)

plot(round_to=None)

Plot results

Parameters:

round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.

class causalpy.skl_experiments.ExperimentalDesign

Base class for experiment designs

__init__(model=None, **kwargs)
model = None
outcome_variable_name = None
class causalpy.skl_experiments.InterruptedTimeSeries

Interrupted time series analysis, a wrapper around the PrePostFit class

Parameters:
  • data – A pandas data frame

  • treatment_time – The index or time value of when treatment begins

  • formula – A statistical model formula

  • model – An sklearn model object

Example

>>> from sklearn.linear_model import LinearRegression
>>> import pandas as pd
>>> import causalpy as cp
>>> df = (
...     cp.load_data("its")
...     .assign(date=lambda x: pd.to_datetime(x["date"]))
...     .set_index("date")
... )
>>> treatment_time = pd.to_datetime("2017-01-01")
>>> result = cp.skl_experiments.InterruptedTimeSeries(
...     df,
...     treatment_time,
...     formula="y ~ 1 + t + C(month)",
...     model = LinearRegression()
... )
expt_type = 'Interrupted Time Series'
class causalpy.skl_experiments.PrePostFit

A class to analyse quasi-experiments where parameter estimation is based on just the pre-intervention data.

Parameters:
  • data – A pandas data frame

  • treatment_time – The index or time value of when treatment begins

  • formula – A statistical model formula

  • model – An scikit-learn model object

Example

>>> from sklearn.linear_model import LinearRegression
>>> import causalpy as cp
>>> df = cp.load_data("sc")
>>> treatment_time = 70
>>> result = cp.skl_experiments.PrePostFit(
...     df,
...     treatment_time,
...     formula="actual ~ 0 + a + b + c + d + e + f + g",
...     model = cp.skl_models.WeightedProportion()
... )
>>> result.get_coeffs()
array(...)
__init__(data, treatment_time, formula, model=None, **kwargs)
get_coeffs()

Returns model coefficients

plot(counterfactual_label='Counterfactual', round_to=None, **kwargs)

Plot experiment results

Parameters:

round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.

plot_coeffs()

Plots coefficient bar plot

class causalpy.skl_experiments.RegressionDiscontinuity

A class to analyse sharp regression discontinuity experiments.

Parameters:
  • data – A pandas dataframe

  • formula – A statistical model formula

  • treatment_threshold – A scalar threshold value at which the treatment is applied

  • model – A sci-kit learn model object

  • running_variable_name – The name of the predictor variable that the treatment threshold is based upon

  • epsilon (float) – A small scalar value which determines how far above and below the treatment threshold to evaluate the causal impact.

  • bandwidth (Optional[float]) – Data outside of the bandwidth (relative to the discontinuity) is not used to fit the model.

Example

>>> import causalpy as cp
>>> from sklearn.linear_model import LinearRegression
>>> data = cp.load_data("rd")
>>> result = cp.skl_experiments.RegressionDiscontinuity(
...     data,
...     formula="y ~ 1 + x + treated",
...     model=LinearRegression(),
...     treatment_threshold=0.5,
... )
>>> result.summary() 
Difference in Differences experiment
Formula: y ~ 1 + x + treated
Running variable: x
Threshold on running variable: 0.5
Results:
Discontinuity at threshold = 0.19
Model coefficients:
    Intercept               0.0
    treated[T.True]         0.19
    x               1.23
__init__(data, formula, treatment_threshold, model=None, running_variable_name='x', epsilon=0.001, bandwidth=None, **kwargs)
Parameters:
plot(round_to=None)

Plot results

Parameters:

round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.

summary()

Print text output summarising the results

class causalpy.skl_experiments.SyntheticControl

A wrapper around the PrePostFit class

Parameters:
  • data – A pandas data frame

  • treatment_time – The index or time value of when treatment begins

  • formula – A statistical model formula

  • model – An sklearn model object

Example

>>> from sklearn.linear_model import LinearRegression
>>> import causalpy as cp
>>> df = cp.load_data("sc")
>>> treatment_time = 70
>>> result = cp.skl_experiments.SyntheticControl(
...     df,
...     treatment_time,
...     formula="actual ~ 0 + a + b + c + d + e + f + g",
...     model = cp.skl_models.WeightedProportion()
... )
plot(plot_predictors=False, round_to=None, **kwargs)

Plot the results

Parameters:

round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.