Drinking age with a scikit-learn model
This example uses the regression discontinuity design to make claims about the causal effects of the minimum legal drinking age (21 in the USA) upon all cause mortality rates. The dataset is from a study by Carpenter and Dobkin [2009].
from sklearn.linear_model import LinearRegression
import causalpy as cp
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
%config InlineBackend.figure_format = 'retina'
df = (
cp.load_data("drinking")
.rename(columns={"agecell": "age"})
.assign(treated=lambda df_: df_.age > 21)
.dropna(axis=0)
)
result = cp.skl_experiments.RegressionDiscontinuity(
df,
formula="all ~ 1 + age + treated",
running_variable_name="age",
model=LinearRegression(),
treatment_threshold=21,
)
Examine results
result.plot();
result.summary()
Difference in Differences experiment
Formula: all ~ 1 + age + treated
Running variable: age
Threshold on running variable: 21
Results:
Discontinuity at threshold = 7.66
Model coefficients:
Intercept 0.0
treated[T.True] 7.662711631820946
age -0.9746855447910273
References
[1]
Christopher Carpenter and Carlos Dobkin. The effect of alcohol consumption on mortality: regression discontinuity evidence from the minimum drinking age. American Economic Journal: Applied Economics, 1(1):164–182, 2009.