causalpy.pymc_experiments
Experiment routines for PyMC models.
ExperimentalDesign base class
Pre-Post Fit
Interrupted Time Series
Synthetic Control
Difference in differences
Regression Discontinuity
Pretest/Posttest Nonequivalent Group Design
- class causalpy.pymc_experiments.DifferenceInDifferences
A class to analyse data from Difference in Difference settings.
Note
There is no pre/post intervention data distinction for DiD, we fit all the data available.
- Parameters:
Example
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False, ... } ... ) ... )
- __init__(data, formula, time_variable_name, group_variable_name, model=None, **kwargs)
- expt_type = None
- property idata
Access to the models InferenceData object
- model = None
- plot(round_to=None)
Plot the results.
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- print_coefficients(round_to=None)
Prints the model coefficients
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- Return type:
Example
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients(round_to=1) Model coefficients: Intercept 1, 94% HDI [1, 1] post_treatment[T.True] 1, 94% HDI [0.9, 1] group 0.2, 94% HDI [0.09, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.08, 94% HDI [0.07, 0.1]
- class causalpy.pymc_experiments.ExperimentalDesign
Base class for other experiment types
See subclasses for examples of most methods
- __init__(model=None, **kwargs)
- expt_type = None
- property idata
Access to the models InferenceData object
- model = None
- print_coefficients(round_to=None)
Prints the model coefficients
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- Return type:
Example
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients(round_to=1) Model coefficients: Intercept 1, 94% HDI [1, 1] post_treatment[T.True] 1, 94% HDI [0.9, 1] group 0.2, 94% HDI [0.09, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.08, 94% HDI [0.07, 0.1]
- class causalpy.pymc_experiments.InstrumentalVariable
A class to analyse instrumental variable style experiments.
- Parameters:
instruments_data (
DataFrame
) – A pandas dataframe of instruments for our treatment variable. Should contain instruments Z, and treatment tdata (
DataFrame
) – A pandas dataframe of covariates for fitting the focal regression of interest. Should contain covariates X including treatment t and outcome yinstruments_formula (
str
) – A statistical model formula for the instrumental stage regression e.g. t ~ 1 + z1 + z2 + z3formula (
str
) –A statistical model formula for the
focal regression e.g. y ~ 1 + t + x1 + x2 + x3
model – A PyMC model
priors –
An optional dictionary of priors for the mus and sigmas of both regressions. If priors are not specified we will substitue MLE estimates for the beta coefficients. Greater control can be achieved by specifying the priors directly e.g. priors = {
”mus”: [0, 0], “sigmas”: [1, 1], “eta”: 2, “lkj_sd”: 2, }
Example
>>> import pandas as pd >>> import causalpy as cp >>> from causalpy.pymc_experiments import InstrumentalVariable >>> from causalpy.pymc_models import InstrumentalVariableRegression >>> import numpy as np >>> N = 100 >>> e1 = np.random.normal(0, 3, N) >>> e2 = np.random.normal(0, 1, N) >>> Z = np.random.uniform(0, 1, N) >>> ## Ensure the endogeneity of the the treatment variable >>> X = -1 + 4 * Z + e2 + 2 * e1 >>> y = 2 + 3 * X + 3 * e1 >>> test_data = pd.DataFrame({"y": y, "X": X, "Z": Z}) >>> sample_kwargs = { ... "tune": 1, ... "draws": 5, ... "chains": 1, ... "cores": 4, ... "target_accept": 0.95, ... "progressbar": False, ... } >>> instruments_formula = "X ~ 1 + Z" >>> formula = "y ~ 1 + X" >>> instruments_data = test_data[["X", "Z"]] >>> data = test_data[["y", "X"]] >>> iv = InstrumentalVariable( ... instruments_data=instruments_data, ... data=data, ... instruments_formula=instruments_formula, ... formula=formula, ... model=InstrumentalVariableRegression(sample_kwargs=sample_kwargs), ... )
- __init__(instruments_data, data, instruments_formula, formula, model=None, priors=None, **kwargs)
- expt_type = None
- get_2SLS_fit()
Two Stage Least Squares Fit
This function is called by the experiment, results are used for priors if none are provided.
- get_naive_OLS_fit()
Naive Ordinary Least Squares
This function is called by the experiment.
- property idata
Access to the models InferenceData object
- model = None
- print_coefficients(round_to=None)
Prints the model coefficients
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- Return type:
Example
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients(round_to=1) Model coefficients: Intercept 1, 94% HDI [1, 1] post_treatment[T.True] 1, 94% HDI [0.9, 1] group 0.2, 94% HDI [0.09, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.08, 94% HDI [0.07, 0.1]
- class causalpy.pymc_experiments.InterruptedTimeSeries
A wrapper around PrePostFit class
- Parameters:
Example
>>> import causalpy as cp >>> df = ( ... cp.load_data("its") ... .assign(date=lambda x: pd.to_datetime(x["date"])) ... .set_index("date") ... ) >>> treatment_time = pd.to_datetime("2017-01-01") >>> seed = 42 >>> result = cp.pymc_experiments.InterruptedTimeSeries( ... df, ... treatment_time, ... formula="y ~ 1 + t + C(month)", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False, ... } ... ) ... )
- __init__(data, treatment_time, formula, model=None, **kwargs)
- expt_type = 'Interrupted Time Series'
- property idata
Access to the models InferenceData object
- model = None
- plot(counterfactual_label='Counterfactual', round_to=None, **kwargs)
Plot the results
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- print_coefficients(round_to=None)
Prints the model coefficients
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- Return type:
Example
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients(round_to=1) Model coefficients: Intercept 1, 94% HDI [1, 1] post_treatment[T.True] 1, 94% HDI [0.9, 1] group 0.2, 94% HDI [0.09, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.08, 94% HDI [0.07, 0.1]
- class causalpy.pymc_experiments.InversePropensityWeighting
A class to analyse inverse propensity weighting experiments.
- Parameters:
data (
DataFrame
) – A pandas dataframeformula (
str
) – A statistical model formula for the propensity model
- :param outcome_variable
A string denoting the outcome variable in datq to be reweighted
- Parameters:
weighting_scheme (
str
) – A string denoting which weighting scheme to use among: ‘raw’, ‘robust’, ‘doubly robust’ or ‘overlap’. See Aronow and Miller “Foundations of Agnostic Statistics” for discussion and computation of these weighting schemes.model – A PyMC model
Example
>>> import causalpy as cp >>> df = cp.load_data("nhefs") >>> seed = 42 >>> result = cp.pymc_experiments.InversePropensityWeighting( ... df, ... formula="trt ~ 1 + age + race", ... outcome_variable ="outcome", ... weighting_scheme="robust", ... model=cp.pymc_models.PropensityScore( ... sample_kwargs={ ... "draws": 100, ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False, ... }, ... ), ... )
- __init__(data, formula, outcome_variable, weighting_scheme, model=None, **kwargs)
- expt_type = None
- get_ate(i, idata, method='doubly_robust')
- property idata
Access to the models InferenceData object
- make_doubly_robust_adjustment(ps)
The doubly robust weighting scheme is also discussed in Aronow and Miller, but a bit more generally than our implementation here. Here we have specified the outcome model to be a simple OLS model. In this way the compromise between the outcome model and the propensity model is always done with OLS.
- make_overlap_adjustments(ps)
This weighting scheme was adapted from Lucy D’Agostino McGowan’s blog on Propensity Score Weights referenced in the primary CausalPy explainer notebook
- make_raw_adjustments(ps)
This estimator is discussed in Aronow and Miller as the simplest of base form of inverse propensity weighting schemes
- make_robust_adjustments(ps)
This estimator is discussed in Aronow and Miller’s book as being related to the Horvitz Thompson method
- model = None
- plot_ATE(idata=None, method=None, prop_draws=100, ate_draws=300)
- plot_balance_ecdf(covariate, idata=None, weighting_scheme=None)
Plotting function takes a single covariate and shows the differences in the ECDF between the treatment and control groups before and after weighting. It provides a visual check on the balance achieved by using the different weighting schemes
- print_coefficients(round_to=None)
Prints the model coefficients
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- Return type:
Example
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients(round_to=1) Model coefficients: Intercept 1, 94% HDI [1, 1] post_treatment[T.True] 1, 94% HDI [0.9, 1] group 0.2, 94% HDI [0.09, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.08, 94% HDI [0.07, 0.1]
- weighted_percentile(data, weights, perc)
perc : percentile in [0-1]!
- class causalpy.pymc_experiments.PrePostFit
A class to analyse quasi-experiments where parameter estimation is based on just the pre-intervention data.
- Parameters:
Example
>>> import causalpy as cp >>> sc = cp.load_data("sc") >>> treatment_time = 70 >>> seed = 42 >>> result = cp.pymc_experiments.PrePostFit( ... sc, ... treatment_time, ... formula="actual ~ 0 + a + g", ... model=cp.pymc_models.WeightedSumFitter( ... sample_kwargs={ ... "draws": 400, ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False ... } ... ), ... ) >>> result.summary(round_to=1) ==================================Pre-Post Fit================================== Formula: actual ~ 0 + a + g Model coefficients: a 0.6, 94% HDI [0.6, 0.6] g 0.4, 94% HDI [0.4, 0.4] sigma 0.8, 94% HDI [0.6, 0.9]
- __init__(data, treatment_time, formula, model=None, **kwargs)
- expt_type = None
- property idata
Access to the models InferenceData object
- model = None
- plot(counterfactual_label='Counterfactual', round_to=None, **kwargs)
Plot the results
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- print_coefficients(round_to=None)
Prints the model coefficients
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- Return type:
Example
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients(round_to=1) Model coefficients: Intercept 1, 94% HDI [1, 1] post_treatment[T.True] 1, 94% HDI [0.9, 1] group 0.2, 94% HDI [0.09, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.08, 94% HDI [0.07, 0.1]
- class causalpy.pymc_experiments.PrePostNEGD
A class to analyse data from pretest/posttest designs
- Parameters:
data (
DataFrame
) – A pandas dataframeformula (
str
) – A statistical model formulagroup_variable_name (
str
) – Name of the column in data for the group variable, should be either binary or booleanpretreatment_variable_name (
str
) – Name of the column in data for the pretreatment variablemodel – A PyMC model
Example
>>> import causalpy as cp >>> df = cp.load_data("anova1") >>> seed = 42 >>> result = cp.pymc_experiments.PrePostNEGD( ... df, ... formula="post ~ 1 + C(group) + pre", ... group_variable_name="group", ... pretreatment_variable_name="pre", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False, ... } ... ) ... ) >>> result.summary(round_to=1) ==================Pretest/posttest Nonequivalent Group Design=================== Formula: post ~ 1 + C(group) + pre Results: Causal impact = 2, $CI_{94%}$[2, 2] Model coefficients: Intercept -0.5, 94% HDI [-1, 0.2] C(group)[T.1] 2, 94% HDI [2, 2] pre 1, 94% HDI [1, 1] sigma 0.5, 94% HDI [0.5, 0.6]
- __init__(data, formula, group_variable_name, pretreatment_variable_name, model=None, **kwargs)
- expt_type = None
- property idata
Access to the models InferenceData object
- model = None
- plot(round_to=None)
Plot the results
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- print_coefficients(round_to=None)
Prints the model coefficients
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- Return type:
Example
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients(round_to=1) Model coefficients: Intercept 1, 94% HDI [1, 1] post_treatment[T.True] 1, 94% HDI [0.9, 1] group 0.2, 94% HDI [0.09, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.08, 94% HDI [0.07, 0.1]
- class causalpy.pymc_experiments.RegressionDiscontinuity
A class to analyse sharp regression discontinuity experiments.
- Parameters:
data (
DataFrame
) – A pandas dataframeformula (
str
) – A statistical model formulatreatment_threshold (
float
) – A scalar threshold value at which the treatment is appliedmodel – A PyMC model
running_variable_name (
str
) – The name of the predictor variable that the treatment threshold is based uponepsilon (
float
) – A small scalar value which determines how far above and below the treatment threshold to evaluate the causal impact.bandwidth (
float
) – Data outside of the bandwidth (relative to the discontinuity) is not used to fit the model.
Example
>>> import causalpy as cp >>> df = cp.load_data("rd") >>> seed = 42 >>> result = cp.pymc_experiments.RegressionDiscontinuity( ... df, ... formula="y ~ 1 + x + treated + x:treated", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 100, ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False, ... }, ... ), ... treatment_threshold=0.5, ... )
- __init__(data, formula, treatment_threshold, model=None, running_variable_name='x', epsilon=0.001, bandwidth=inf, **kwargs)
- expt_type = None
- property idata
Access to the models InferenceData object
- model = None
- plot(round_to=None)
Plot the results
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- print_coefficients(round_to=None)
Prints the model coefficients
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- Return type:
Example
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients(round_to=1) Model coefficients: Intercept 1, 94% HDI [1, 1] post_treatment[T.True] 1, 94% HDI [0.9, 1] group 0.2, 94% HDI [0.09, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.08, 94% HDI [0.07, 0.1]
- class causalpy.pymc_experiments.RegressionKink
A class to analyse sharp regression kink experiments.
- Parameters:
data (
DataFrame
) – A pandas dataframeformula (
str
) – A statistical model formulakink_point (
float
) – A scalar threshold value at which there is a change in the first derivative of the assignment functionmodel – A PyMC model
running_variable_name (
str
) – The name of the predictor variable that the kink_point is based uponepsilon (
float
) – A small scalar value which determines how far above and below the kink point to evaluate the causal impact.bandwidth (
float
) – Data outside of the bandwidth (relative to the discontinuity) is not used to fit the model.
- __init__(data, formula, kink_point, model=None, running_variable_name='x', epsilon=0.001, bandwidth=inf, **kwargs)
- expt_type = None
- property idata
Access to the models InferenceData object
- model = None
- plot(round_to=None)
Plot the results
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- print_coefficients(round_to=None)
Prints the model coefficients
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- Return type:
Example
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients(round_to=1) Model coefficients: Intercept 1, 94% HDI [1, 1] post_treatment[T.True] 1, 94% HDI [0.9, 1] group 0.2, 94% HDI [0.09, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.08, 94% HDI [0.07, 0.1]
- class causalpy.pymc_experiments.SyntheticControl
A wrapper around the PrePostFit class
- Parameters:
Example
>>> import causalpy as cp >>> df = cp.load_data("sc") >>> treatment_time = 70 >>> seed = 42 >>> result = cp.pymc_experiments.SyntheticControl( ... df, ... treatment_time, ... formula="actual ~ 0 + a + b + c + d + e + f + g", ... model=cp.pymc_models.WeightedSumFitter( ... sample_kwargs={ ... "target_accept": 0.95, ... "random_seed": seed, ... "progressbar": False, ... } ... ), ... )
- __init__(data, treatment_time, formula, model=None, **kwargs)
- expt_type = 'Synthetic Control'
- property idata
Access to the models InferenceData object
- model = None
- plot(plot_predictors=False, **kwargs)
Plot the results
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- print_coefficients(round_to=None)
Prints the model coefficients
- Parameters:
round_to – Number of decimals used to round results. Defaults to 2. Use “None” to return raw numbers.
- Return type:
Example
>>> import causalpy as cp >>> df = cp.load_data("did") >>> seed = 42 >>> result = cp.pymc_experiments.DifferenceInDifferences( ... df, ... formula="y ~ 1 + group*post_treatment", ... time_variable_name="t", ... group_variable_name="group", ... model=cp.pymc_models.LinearRegression( ... sample_kwargs={ ... "draws": 2000, ... "random_seed": seed, ... "progressbar": False ... }), ... ) >>> result.print_coefficients(round_to=1) Model coefficients: Intercept 1, 94% HDI [1, 1] post_treatment[T.True] 1, 94% HDI [0.9, 1] group 0.2, 94% HDI [0.09, 0.2] group:post_treatment[T.True] 0.5, 94% HDI [0.4, 0.6] sigma 0.08, 94% HDI [0.07, 0.1]