causalpy.skl_experiments
Experiments for Scikit-Learn models
ExperimentalDesign: base class for scikit-learn experiments
PrePostFit: base class for synthetic control and interrupted time series
SyntheticControl
InterruptedTimeSeries
DifferenceInDifferences
RegressionDiscontinuity
- class causalpy.skl_experiments.DifferenceInDifferences
Note
There is no pre/post intervention data distinction for DiD, we fit all the data available.
- Parameters:
Example
>>> import causalpy as cp >>> from sklearn.linear_model import LinearRegression >>> df = cp.load_data("did") >>> result = cp.skl_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... treated=1, ... untreated=0, ... model=LinearRegression(), ... )
- __init__(data, formula, time_variable_name, group_variable_name, treated, untreated, model=None, **kwargs)
- plot(round_to=None)
Plot results
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- class causalpy.skl_experiments.ExperimentalDesign
Base class for experiment designs
- __init__(model=None, **kwargs)
- model = None
- outcome_variable_name = None
- class causalpy.skl_experiments.InterruptedTimeSeries
Interrupted time series analysis, a wrapper around the PrePostFit class
- Parameters:
data – A pandas data frame
treatment_time – The index or time value of when treatment begins
formula – A statistical model formula
model – An sklearn model object
Example
>>> from sklearn.linear_model import LinearRegression >>> import pandas as pd >>> import causalpy as cp >>> df = ( ... cp.load_data("its") ... .assign(date=lambda x: pd.to_datetime(x["date"])) ... .set_index("date") ... ) >>> treatment_time = pd.to_datetime("2017-01-01") >>> result = cp.skl_experiments.InterruptedTimeSeries( ... df, ... treatment_time, ... formula="y ~ 1 + t + C(month)", ... model = LinearRegression() ... )
- expt_type = 'Interrupted Time Series'
- class causalpy.skl_experiments.PrePostFit
A class to analyse quasi-experiments where parameter estimation is based on just the pre-intervention data.
- Parameters:
data – A pandas data frame
treatment_time – The index or time value of when treatment begins
formula – A statistical model formula
model – An scikit-learn model object
Example
>>> from sklearn.linear_model import LinearRegression >>> import causalpy as cp >>> df = cp.load_data("sc") >>> treatment_time = 70 >>> result = cp.skl_experiments.PrePostFit( ... df, ... treatment_time, ... formula="actual ~ 0 + a + b + c + d + e + f + g", ... model = cp.skl_models.WeightedProportion() ... ) >>> result.get_coeffs() array(...)
- __init__(data, treatment_time, formula, model=None, **kwargs)
- get_coeffs()
Returns model coefficients
- plot(counterfactual_label='Counterfactual', round_to=None, **kwargs)
Plot experiment results
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- plot_coeffs()
Plots coefficient bar plot
- class causalpy.skl_experiments.RegressionDiscontinuity
A class to analyse sharp regression discontinuity experiments.
- Parameters:
data – A pandas dataframe
formula – A statistical model formula
treatment_threshold – A scalar threshold value at which the treatment is applied
model – A sci-kit learn model object
running_variable_name – The name of the predictor variable that the treatment threshold is based upon
epsilon (
float
) – A small scalar value which determines how far above and below the treatment threshold to evaluate the causal impact.bandwidth (
Optional
[float
]) – Data outside of the bandwidth (relative to the discontinuity) is not used to fit the model.
Example
>>> import causalpy as cp >>> from sklearn.linear_model import LinearRegression >>> data = cp.load_data("rd") >>> result = cp.skl_experiments.RegressionDiscontinuity( ... data, ... formula="y ~ 1 + x + treated", ... model=LinearRegression(), ... treatment_threshold=0.5, ... ) >>> result.summary() Difference in Differences experiment Formula: y ~ 1 + x + treated Running variable: x Threshold on running variable: 0.5 Results: Discontinuity at threshold = 0.19 Model coefficients: Intercept 0.0 treated[T.True] 0.19 x 1.23
- __init__(data, formula, treatment_threshold, model=None, running_variable_name='x', epsilon=0.001, bandwidth=None, **kwargs)
- plot(round_to=None)
Plot results
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- summary()
Print text output summarising the results
- class causalpy.skl_experiments.SyntheticControl
A wrapper around the PrePostFit class
- Parameters:
data – A pandas data frame
treatment_time – The index or time value of when treatment begins
formula – A statistical model formula
model – An sklearn model object
Example
>>> from sklearn.linear_model import LinearRegression >>> import causalpy as cp >>> df = cp.load_data("sc") >>> treatment_time = 70 >>> result = cp.skl_experiments.SyntheticControl( ... df, ... treatment_time, ... formula="actual ~ 0 + a + b + c + d + e + f + g", ... model = cp.skl_models.WeightedProportion() ... )
- plot(plot_predictors=False, round_to=None, **kwargs)
Plot the results
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.