causalpy.pymc_models
Defines generic PyMC ModelBuilder class and subclasses for
WeightedSumFitter model for Synthetic Control experiments
LinearRegression model
Models are intended to be used from inside an experiment class (see PyMC experiments). This is why the examples require some extra manipulation input data, often to ensure y has the correct shape.
- class causalpy.pymc_models.InstrumentalVariableRegression
Custom PyMC model for instrumental linear regression
Example
>>> import causalpy as cp >>> import numpy as np >>> from causalpy.pymc_models import InstrumentalVariableRegression >>> N = 10 >>> e1 = np.random.normal(0, 3, N) >>> e2 = np.random.normal(0, 1, N) >>> Z = np.random.uniform(0, 1, N) >>> ## Ensure the endogeneity of the the treatment variable >>> X = -1 + 4 * Z + e2 + 2 * e1 >>> y = 2 + 3 * X + 3 * e1 >>> t = X.reshape(10,1) >>> y = y.reshape(10,1) >>> Z = np.asarray([[1, Z[i]] for i in range(0,10)]) >>> X = np.asarray([[1, X[i]] for i in range(0,10)]) >>> COORDS = {'instruments': ['Intercept', 'Z'], 'covariates': ['Intercept', 'X']} >>> sample_kwargs = { ... "tune": 5, ... "draws": 10, ... "chains": 2, ... "cores": 2, ... "target_accept": 0.95, ... "progressbar": False, ... } >>> iv_reg = InstrumentalVariableRegression(sample_kwargs=sample_kwargs) >>> iv_reg.fit(X, Z,y, t, COORDS, { ... "mus": [[-2,4], [0.5, 3]], ... "sigmas": [1, 1], ... "eta": 2, ... "lkj_sd": 2, ... }) Inference data...
- build_model(X, Z, y, t, coords, priors)
Specify model with treatment regression and focal regression data and priors
- Parameters:
X – A pandas dataframe used to predict our outcome y
Z – A pandas dataframe used to predict our treatment variable t
y – An array of values representing our focal outcome y
t – An array of values representing the treatment t of which we’re interested in estimating the causal impact
coords – A dictionary with the coordinate names for our instruments and covariates
priors – An optional dictionary of priors for the mus and sigmas of both regressions
priors = {"mus": [0, 0], "sigmas": [1, 1], "eta": 2, "lkj_sd": 2}
- fit(X, Z, y, t, coords, priors)
Draw samples from posterior, prior predictive, and posterior predictive distributions.
- class causalpy.pymc_models.LinearRegression
Custom PyMC model for linear regression
Defines the PyMC model
\[ \begin{align}\begin{aligned}\beta &\sim \mathrm{Normal}(0, 50)\\\sigma &\sim \mathrm{HalfNormal}(1)\\\mu &= X * \beta\\y &\sim \mathrm{Normal}(\mu, \sigma)\end{aligned}\end{align} \]Example
>>> import causalpy as cp >>> import numpy as np >>> from causalpy.pymc_models import LinearRegression >>> rd = cp.load_data("rd") >>> X = rd[["x", "treated"]] >>> y = np.asarray(rd["y"]).reshape((rd["y"].shape[0],1)) >>> lr = LinearRegression(sample_kwargs={"progressbar": False}) >>> lr.fit(X, y, coords={ ... 'coeffs': ['x', 'treated'], ... 'obs_indx': np.arange(rd.shape[0]) ... }, ... ) Inference...
- build_model(X, y, coords)
Defines the PyMC model
- class causalpy.pymc_models.ModelBuilder
This is a wrapper around pm.Model to give scikit-learn like API.
Public Methods
build_model: must be implemented by subclasses
fit: populates idata attribute
predict: returns predictions on new data
score: returns Bayesian \(R^2\)
Example
>>> import causalpy as cp >>> import numpy as np >>> import pymc as pm >>> from causalpy.pymc_models import ModelBuilder >>> class MyToyModel(ModelBuilder): ... def build_model(self, X, y, coords): ... with self: ... X_ = pm.Data(name="X", value=X) ... y_ = pm.Data(name="y", value=y) ... beta = pm.Normal("beta", mu=0, sigma=1, shape=X_.shape[1]) ... sigma = pm.HalfNormal("sigma", sigma=1) ... mu = pm.Deterministic("mu", pm.math.dot(X_, beta)) ... pm.Normal("y_hat", mu=mu, sigma=sigma, observed=y_) >>> rng = np.random.default_rng(seed=42) >>> X = rng.normal(loc=0, scale=1, size=(20, 2)) >>> y = rng.normal(loc=0, scale=1, size=(20,)) >>> model = MyToyModel( ... sample_kwargs={ ... "chains": 2, ... "draws": 2000, ... "progressbar": False, ... "random_seed": rng, ... } ... ) >>> model.fit(X, y) Inference... >>> X_new = rng.normal(loc=0, scale=1, size=(20,2)) >>> model.predict(X_new) Inference... >>> model.score(X, y) r2 0.3 r2_std 0.0 dtype: float64
- __init__(sample_kwargs=None)
- Parameters:
sample_kwargs (
Optional
[Dict
[str
,Any
]]) – A dictionary of kwargs that get unpacked and passed to thepymc.sample()
function. Defaults to an empty dictionary.
- fit(X, y, coords=None)
Draw samples fromposterior, prior predictive, and posterior predictive distributions, placing them in the model’s idata attribute.
- predict(X)
Predict data given input data X
Caution
Results in KeyError if model hasn’t been fit.
- score(X, y)
Score the Bayesian \(R^2\) given inputs
X
and outputsy
. :rtype:Series
Caution
The Bayesian \(R^2\) is not the same as the traditional coefficient of determination, https://en.wikipedia.org/wiki/Coefficient_of_determination.
- Return type:
Series
- class causalpy.pymc_models.WeightedSumFitter
Used for synthetic control experiments
Note
Generally, the .fit() method should be used rather than calling .build_model() directly.
Defines the PyMC model:
\[ \begin{align}\begin{aligned}\sigma &\sim \mathrm{HalfNormal}(1)\\\beta &\sim \mathrm{Dirichlet}(1,...,1)\\\mu &= X * \beta\\y &\sim \mathrm{Normal}(\mu, \sigma)\end{aligned}\end{align} \]Example
>>> import causalpy as cp >>> import numpy as np >>> from causalpy.pymc_models import WeightedSumFitter >>> sc = cp.load_data("sc") >>> X = sc[['a', 'b', 'c', 'd', 'e', 'f', 'g']] >>> y = np.asarray(sc['actual']).reshape((sc.shape[0], 1)) >>> wsf = WeightedSumFitter(sample_kwargs={"progressbar": False}) >>> wsf.fit(X,y) Inference ...
- build_model(X, y, coords)
Defines the PyMC model