causalpy.pymc_models

Defines generic PyMC ModelBuilder class and subclasses for

  • WeightedSumFitter model for Synthetic Control experiments

  • LinearRegression model

Models are intended to be used from inside an experiment class (see PyMC experiments). This is why the examples require some extra manipulation input data, often to ensure y has the correct shape.

class causalpy.pymc_models.InstrumentalVariableRegression

Custom PyMC model for instrumental linear regression

Example

>>> import causalpy as cp
>>> import numpy as np
>>> from causalpy.pymc_models import InstrumentalVariableRegression
>>> N = 10
>>> e1 = np.random.normal(0, 3, N)
>>> e2 = np.random.normal(0, 1, N)
>>> Z = np.random.uniform(0, 1, N)
>>> ## Ensure the endogeneity of the the treatment variable
>>> X = -1 + 4 * Z + e2 + 2 * e1
>>> y = 2 + 3 * X + 3 * e1
>>> t = X.reshape(10,1)
>>> y = y.reshape(10,1)
>>> Z = np.asarray([[1, Z[i]] for i in range(0,10)])
>>> X = np.asarray([[1, X[i]] for i in range(0,10)])
>>> COORDS = {'instruments': ['Intercept', 'Z'], 'covariates': ['Intercept', 'X']}
>>> sample_kwargs = {
...     "tune": 5,
...     "draws": 10,
...     "chains": 2,
...     "cores": 2,
...     "target_accept": 0.95,
...     "progressbar": False,
... }
>>> iv_reg = InstrumentalVariableRegression(sample_kwargs=sample_kwargs)
>>> iv_reg.fit(X, Z,y, t, COORDS, {
...                  "mus": [[-2,4], [0.5, 3]],
...                  "sigmas": [1, 1],
...                  "eta": 2,
...                  "lkj_sd": 2,
...              })
Inference data...
build_model(X, Z, y, t, coords, priors)

Specify model with treatment regression and focal regression data and priors

Parameters:
  • X – A pandas dataframe used to predict our outcome y

  • Z – A pandas dataframe used to predict our treatment variable t

  • y – An array of values representing our focal outcome y

  • t – An array of values representing the treatment t of which we’re interested in estimating the causal impact

  • coords – A dictionary with the coordinate names for our instruments and covariates

  • priors – An optional dictionary of priors for the mus and sigmas of both regressions priors = {"mus": [0, 0], "sigmas": [1, 1], "eta": 2, "lkj_sd": 2}

fit(X, Z, y, t, coords, priors)

Draw samples from posterior, prior predictive, and posterior predictive distributions.

class causalpy.pymc_models.LinearRegression

Custom PyMC model for linear regression

Defines the PyMC model

\[ \begin{align}\begin{aligned}\beta &\sim \mathrm{Normal}(0, 50)\\\sigma &\sim \mathrm{HalfNormal}(1)\\\mu &= X * \beta\\y &\sim \mathrm{Normal}(\mu, \sigma)\end{aligned}\end{align} \]

Example

>>> import causalpy as cp
>>> import numpy as np
>>> from causalpy.pymc_models import LinearRegression
>>> rd = cp.load_data("rd")
>>> X = rd[["x", "treated"]]
>>> y = np.asarray(rd["y"]).reshape((rd["y"].shape[0],1))
>>> lr = LinearRegression(sample_kwargs={"progressbar": False})
>>> lr.fit(X, y, coords={
...                 'coeffs': ['x', 'treated'],
...                 'obs_indx': np.arange(rd.shape[0])
...                },
... )
Inference data...
build_model(X, y, coords)

Defines the PyMC model

class causalpy.pymc_models.ModelBuilder

This is a wrapper around pm.Model to give scikit-learn like API.

Public Methods

  • build_model: must be implemented by subclasses

  • fit: populates idata attribute

  • predict: returns predictions on new data

  • score: returns Bayesian \(R^2\)

Example

>>> import causalpy as cp
>>> import numpy as np
>>> import pymc as pm
>>> from causalpy.pymc_models import ModelBuilder
>>> class MyToyModel(ModelBuilder):
...     def build_model(self, X, y, coords):
...         with self:
...             X_ = pm.Data(name="X", value=X)
...             y_ = pm.Data(name="y", value=y)
...             beta = pm.Normal("beta", mu=0, sigma=1, shape=X_.shape[1])
...             sigma = pm.HalfNormal("sigma", sigma=1)
...             mu = pm.Deterministic("mu", pm.math.dot(X_, beta))
...             pm.Normal("y_hat", mu=mu, sigma=sigma, observed=y_)
>>> rng = np.random.default_rng(seed=42)
>>> X = rng.normal(loc=0, scale=1, size=(20, 2))
>>> y = rng.normal(loc=0, scale=1, size=(20,))
>>> model = MyToyModel(
...             sample_kwargs={
...                 "chains": 2,
...                 "draws": 2000,
...                 "progressbar": False,
...                 "random_seed": rng,
...             }
... )
>>> model.fit(X, y)
Inference data...
>>> X_new = rng.normal(loc=0, scale=1, size=(20,2))
>>> model.predict(X_new)
Inference data...
>>> model.score(X, y)
r2        0.390344
r2_std    0.081135
dtype: float64
__init__(sample_kwargs=None)
Parameters:

sample_kwargs (Optional[Dict[str, Any]]) – A dictionary of kwargs that get unpacked and passed to the pymc.sample() function. Defaults to an empty dictionary.

build_model(X, y, coords)

Build the model, must be implemented by subclass.

Return type:

None

fit(X, y, coords=None)

Draw samples from posterior, prior predictive, and posterior predictive distributions, placing them in the model’s idata attribute.

Return type:

None

Parameters:

coords (Dict[str, Any] | None)

predict(X)

Predict data given input data X

Caution

Results in KeyError if model hasn’t been fit.

score(X, y)

Score the Bayesian \(R^2\) given inputs X and outputs y. :rtype: Series

Caution

The Bayesian \(R^2\) is not the same as the traditional coefficient of determination, https://en.wikipedia.org/wiki/Coefficient_of_determination.

Return type:

Series

class causalpy.pymc_models.PropensityScore

Custom PyMC model for inverse propensity score models

Defines the PyMC model

\[ \begin{align}\begin{aligned}\beta &\sim \mathrm{Normal}(0, 1)\\\sigma &\sim \mathrm{HalfNormal}(1)\\\mu &= X * \beta\\p &= logit^{-1}(mu)\\t &\sim \mathrm{Bernoulli}(p)\end{aligned}\end{align} \]

Example

>>> import causalpy as cp
>>> import numpy as np
>>> from causalpy.pymc_models import PropensityScore
>>> df = cp.load_data('nhefs')
>>> X = df[["age", "race"]]
>>> t = np.asarray(df["trt"])
>>> ps = PropensityScore(sample_kwargs={"progressbar": False})
>>> ps.fit(X, t, coords={
...                 'coeffs': ['age', 'race'],
...                 'obs_indx': np.arange(df.shape[0])
...                },
... )
Inference...
build_model(X, t, coords)

Defines the PyMC propensity model

fit(X, t, coords)

Draw samples from posterior, prior predictive, and posterior predictive distributions. We overwrite the base method because the base method assumes a variable y and we use t to indicate the treatment variable here.

class causalpy.pymc_models.WeightedSumFitter

Used for synthetic control experiments

Note

Generally, the .fit() method should be used rather than calling .build_model() directly.

Defines the PyMC model:

\[ \begin{align}\begin{aligned}\sigma &\sim \mathrm{HalfNormal}(1)\\\beta &\sim \mathrm{Dirichlet}(1,...,1)\\\mu &= X * \beta\\y &\sim \mathrm{Normal}(\mu, \sigma)\end{aligned}\end{align} \]

Example

>>> import causalpy as cp
>>> import numpy as np
>>> from causalpy.pymc_models import WeightedSumFitter
>>> sc = cp.load_data("sc")
>>> X = sc[['a', 'b', 'c', 'd', 'e', 'f', 'g']]
>>> y = np.asarray(sc['actual']).reshape((sc.shape[0], 1))
>>> wsf = WeightedSumFitter(sample_kwargs={"progressbar": False})
>>> wsf.fit(X,y)
Inference data...
build_model(X, y, coords)

Defines the PyMC model