Drinking age with a scikit-learn model

This example uses the regression discontinuity design to make claims about the causal effects of the minimum legal drinking age (21 in the USA) upon all cause mortality rates. The dataset is from a study by Carpenter and Dobkin [2009].

from sklearn.linear_model import LinearRegression

import causalpy as cp
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
%config InlineBackend.figure_format = 'retina'
df = (
    cp.load_data("drinking")
    .rename(columns={"agecell": "age"})
    .assign(treated=lambda df_: df_.age > 21)
    .dropna(axis=0)
)
result = cp.skl_experiments.RegressionDiscontinuity(
    df,
    formula="all ~ 1 + age + treated",
    running_variable_name="age",
    model=LinearRegression(),
    treatment_threshold=21,
)

Examine results

result.plot();
../_images/2fddc3680384b148a59e0f01d1c0992b84052d506134ea1e23a3ab505d2e62b6.png
result.summary()
Difference in Differences experiment
Formula: all ~ 1 + age + treated
Running variable: age
Threshold on running variable: 21

Results:
Discontinuity at threshold = 7.66
Model coefficients:
	Intercept		0.0
	treated[T.True]		7.662711631820946
	age		-0.9746855447910273

References

[1]

Christopher Carpenter and Carlos Dobkin. The effect of alcohol consumption on mortality: regression discontinuity evidence from the minimum drinking age. American Economic Journal: Applied Economics, 1(1):164–182, 2009.