`causalpy.skl_experiments`

Experiments for Scikit-Learn models

ExperimentalDesign: base class for scikit-learn experiments
PrePostFit: base class for synthetic control and interrupted time series
SyntheticControl
InterruptedTimeSeries
DifferenceInDifferences
RegressionDiscontinuity

class causalpy.skl_experiments.DifferenceInDifferences

Note

There is no pre/post intervention data distinction for DiD, we fit all the data available.

Parameters:

data (DataFrame) – A pandas data frame
formula (str) – A statistical model formula
time_variable_name (str) – Name of the data column for the time variable
group_variable_name (str) – Name of the data column for the group variable
model – An scikit-learn model for difference in differences

Example

>>> import causalpy as cp
>>> from sklearn.linear_model import LinearRegression
>>> df = cp.load_data("did")
>>> result = cp.skl_experiments.DifferenceInDifferences(
...     df,
...     formula="y ~ 1 + group*post_treatment",
...     time_variable_name="t",
...     group_variable_name="group",
...     treated=1,
...     untreated=0,
...     model=LinearRegression(),
... )

__init__(data, formula, time_variable_name, group_variable_name, treated, untreated, model=None, **kwargs)

Parameters:

data (DataFrame)
formula (str)
time_variable_name (str)
group_variable_name (str)
treated (str)
untreated (str)

plot(round_to=None)

Plot results

Parameters:: round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.

class causalpy.skl_experiments.ExperimentalDesign

Base class for experiment designs

__init__(model=None, **kwargs)

model = None

outcome_variable_name = None

class causalpy.skl_experiments.InterruptedTimeSeries

Interrupted time series analysis, a wrapper around the PrePostFit class

Parameters:

data – A pandas data frame
treatment_time – The index or time value of when treatment begins
formula – A statistical model formula
model – An sklearn model object

Example

>>> from sklearn.linear_model import LinearRegression
>>> import pandas as pd
>>> import causalpy as cp
>>> df = (
...     cp.load_data("its")
...     .assign(date=lambda x: pd.to_datetime(x["date"]))
...     .set_index("date")
... )
>>> treatment_time = pd.to_datetime("2017-01-01")
>>> result = cp.skl_experiments.InterruptedTimeSeries(
...     df,
...     treatment_time,
...     formula="y ~ 1 + t + C(month)",
...     model = LinearRegression()
... )

expt_type = 'Interrupted Time Series'

class causalpy.skl_experiments.PrePostFit

A class to analyse quasi-experiments where parameter estimation is based on just the pre-intervention data.

Parameters:

data – A pandas data frame
treatment_time – The index or time value of when treatment begins
formula – A statistical model formula
model – An scikit-learn model object

Example

>>> from sklearn.linear_model import LinearRegression
>>> import causalpy as cp
>>> df = cp.load_data("sc")
>>> treatment_time = 70
>>> result = cp.skl_experiments.PrePostFit(
...     df,
...     treatment_time,
...     formula="actual ~ 0 + a + b + c + d + e + f + g",
...     model = cp.skl_models.WeightedProportion()
... )
>>> result.get_coeffs()
array(...)

__init__(data, treatment_time, formula, model=None, **kwargs)

get_coeffs(): Returns model coefficients

plot(counterfactual_label='Counterfactual', round_to=None, **kwargs)

Plot experiment results

Parameters:: round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.

plot_coeffs(): Plots coefficient bar plot

class causalpy.skl_experiments.RegressionDiscontinuity

A class to analyse sharp regression discontinuity experiments.

Parameters:

data – A pandas dataframe
formula – A statistical model formula
treatment_threshold – A scalar threshold value at which the treatment is applied
model – A sci-kit learn model object
running_variable_name – The name of the predictor variable that the treatment threshold is based upon
epsilon (float) – A small scalar value which determines how far above and below the treatment threshold to evaluate the causal impact.
bandwidth (Optional[float]) – Data outside of the bandwidth (relative to the discontinuity) is not used to fit the model.

Example

>>> import causalpy as cp
>>> from sklearn.linear_model import LinearRegression
>>> data = cp.load_data("rd")
>>> result = cp.skl_experiments.RegressionDiscontinuity(
...     data,
...     formula="y ~ 1 + x + treated",
...     model=LinearRegression(),
...     treatment_threshold=0.5,
... )
>>> result.summary() 
Difference in Differences experiment
Formula: y ~ 1 + x + treated
Running variable: x
Threshold on running variable: 0.5

Results:
Discontinuity at threshold = 0.19
Model coefficients:
    Intercept               0.0
    treated[T.True]         0.19
    x               1.23

__init__(data, formula, treatment_threshold, model=None, running_variable_name='x', epsilon=0.001, bandwidth=None, **kwargs)

Parameters:

epsilon (float)
bandwidth (float | None)

plot(round_to=None)

Plot results

Parameters:: round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.

summary(): Print text output summarising the results

class causalpy.skl_experiments.SyntheticControl

A wrapper around the PrePostFit class

Parameters:

data – A pandas data frame
treatment_time – The index or time value of when treatment begins
formula – A statistical model formula
model – An sklearn model object

Example

>>> from sklearn.linear_model import LinearRegression
>>> import causalpy as cp
>>> df = cp.load_data("sc")
>>> treatment_time = 70
>>> result = cp.skl_experiments.SyntheticControl(
...     df,
...     treatment_time,
...     formula="actual ~ 0 + a + b + c + d + e + f + g",
...     model = cp.skl_models.WeightedProportion()
... )

plot(plot_predictors=False, round_to=None, **kwargs)

Plot the results

Parameters:: round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.

causalpy.skl_experiments

Example

Example

Example

Example

Example

`causalpy.skl_experiments`