CausalPy logo

CausalPy - causal inference for quasi-experiments#

A Python package focussing on causal inference for quasi-experiments. The package allows users to use different model types. Sophisticated Bayesian methods can be used, harnessing the power of PyMC and ArviZ. But users can also use more traditional Ordinary Least Squares estimation methods via scikit-learn models.

Installation#

To get the latest release you can use pip:

   pip install CausalPy

or conda:

   conda install causalpy -c conda-forge

Alternatively, if you want the very latest version of the package you can install from GitHub:

   pip install git+https://github.com/pymc-labs/CausalPy.git

Quickstart#


   import causalpy as cp
   import matplotlib.pyplot as plt


   # Import and process data
   df = (
      cp.load_data("drinking")
      .rename(columns={"agecell": "age"})
      .assign(treated=lambda df_: df_.age > 21)
      )

   # Run the analysis
   result = cp.RegressionDiscontinuity(
      df,
      formula="all ~ 1 + age + treated",
      running_variable_name="age",
      model=cp.pymc_models.LinearRegression(),
      treatment_threshold=21,
      )

   # Visualize outputs
   fig, ax = result.plot();

   # Get a results summary
   result.summary()

   plt.show()

Videos#

Features#

CausalPy has a broad range of quasi-experimental methods for causal inference:

Method

Description

Synthetic control

Constructs a synthetic version of the treatment group from a weighted combination of control units. Used for causal inference in comparative case studies when a single unit is treated, and there are multiple control units.

Geographical lift

Measures the impact of an intervention in a specific geographic area by comparing it to similar areas without the intervention. Commonly used in marketing to assess regional campaigns.

ANCOVA

Analysis of Covariance combines ANOVA and regression to control for the effects of one or more quantitative covariates. Used when comparing group means while controlling for other variables.

Differences in Differences

Compares the changes in outcomes over time between a treatment group and a control group. Used in observational studies to estimate causal effects by accounting for time trends.

Regression discontinuity

Identifies causal effects by exploiting a sharp cutoff or threshold in an assignment variable. Used when treatment is assigned based on a threshold value of an observed variable, allowing comparison just above and below the cutoff.

Regression kink designs

Focuses on changes in the slope (kinks) of the relationship between variables rather than jumps at cutoff points. Used to identify causal effects when treatment intensity changes at a threshold.

Interrupted time series

Analyzes the effect of an intervention by comparing time series data before and after the intervention. Used when data is collected over time and an intervention occurs at a known point, allowing assessment of changes in level or trend.

Instrumental variable regression

Addresses endogeneity by using an instrument variable that is correlated with the endogenous explanatory variable but uncorrelated with the error term. Used when explanatory variables are correlated with the error term, providing consistent estimates of causal effects.

Inverse Propensity Score Weighting

Weights observations by the inverse of the probability of receiving the treatment. Used in causal inference to create a synthetic sample where the treatment assignment is independent of measured covariates, helping to adjust for confounding variables in observational studies.

Support#

This repository is supported by PyMC Labs.

For companies that want to use CausalPy in production, PyMC Labs is available for consulting and training. We can help you build and deploy your models in production. We have experience with cutting edge Bayesian and causal modelling techniques which we have applied to a range of business domains.

PyMC Labs Logo