StaggeredDifferenceInDifferences#

class causalpy.experiments.staggered_did.StaggeredDifferenceInDifferences[source]#

A class to analyse data from staggered adoption Difference-in-Differences settings.

This class implements the Borusyak, Jaravel, and Spiess (BJS, 2024) imputation estimator for staggered adoption settings. It fits a model on untreated observations only (pre-treatment periods for eventually-treated units plus all periods for never-treated units), then predicts counterfactual outcomes for all observations. Treatment effects are computed as the difference between observed and predicted outcomes for treated observations.

Assumptions#

This estimator requires the following identifying assumptions:

  1. Absorbing treatment: Once a unit receives treatment, it must remain treated in all subsequent periods. Treatment cannot be reversed or temporarily suspended. This is validated at runtime.

  2. Parallel trends: In the absence of treatment, treated and control units would have followed parallel outcome trajectories.

  3. No anticipation: Units do not change their behavior in anticipation of future treatment.

  4. Untreated support at each calendar period: The time fixed effect \(\gamma_t\) for calendar period \(t\) is identified only if at least one unit is untreated in that period. Without never-treated units, post-treatment effects for the last-treated cohort (and any calendar periods where every unit is already treated) are not identified. CausalPy warns when this condition fails and marks the affected ATT(g, t) and ATT(e) cells as non-identified in the output tables.

param data:

A pandas dataframe with panel data (unit x time observations).

type data:

DataFrame

param formula:

A statistical model formula. Recommended: “y ~ 1 + C(unit) + C(time)” for unit and time fixed effects.

type formula:

str

param unit_variable_name:

Name of the column identifying units.

type unit_variable_name:

str

param time_variable_name:

Name of the column identifying time periods.

type time_variable_name:

str

param treated_variable_name:

Name of the column indicating treatment status (0/1). Defaults to “treated”.

type treated_variable_name:

str

param treatment_time_variable_name:

Name of the column containing unit-level treatment time (G_i). If None, treatment time is inferred from the treated_variable_name column.

type treatment_time_variable_name:

str | None

param never_treated_value:

Value indicating never-treated units in treatment_time column. Defaults to np.inf.

type never_treated_value:

Any

param model:

A model for the untreated outcome. Defaults to LinearRegression.

type model:

PyMCModel | RegressorMixin | None

param event_window:

Tuple (min_event_time, max_event_time) to restrict event-time aggregation. If None, uses all available event-times.

type event_window:

tuple[int, int] | None

param reference_event_time:

Event-time index associated with plots (reserved for future use). Defaults to -1.

type reference_event_time:

int

type **kwargs:

Any

param **kwargs:

Additional keyword arguments forwarded to BaseExperiment.

data_#

Augmented data with G (treatment time), event_time, y_hat0 (counterfactual), and tau_hat (treatment effect) columns.

Type:

pd.DataFrame

att_group_time_#

Group-time ATT estimates: ATT(g, t) for each cohort g and calendar time t. Includes an identified column; non-identified cells have NaN estimates.

Type:

pd.DataFrame

att_event_time_#

Event-time ATT estimates: ATT(e) for each event-time e = t - G. Includes an identified column; non-identified cells have NaN estimates.

Type:

pd.DataFrame

non_identified_periods_#

Calendar periods with no untreated observations.

Type:

set

non_identified_cohorts_#

Treatment cohorts with at least one non-identified post-treatment ATT(g, t).

Type:

set

Notes

Panel Balance: This implementation supports both balanced and unbalanced panel data. While balanced panels (where each unit is observed in every time period) are common in staggered DiD applications, the imputation-based approach of Borusyak et al. (2024) can accommodate unbalanced panels. The key requirement is that treatment timing is well-defined for each unit, not that all units are observed in all periods. Unit and observation counts in the summary output are computed without assuming balanced panels.

Example

>>> import causalpy as cp
>>> from causalpy.data.simulate_data import generate_staggered_did_data
>>> df = generate_staggered_did_data(n_units=30, n_time_periods=15, seed=42)
>>> result = cp.StaggeredDifferenceInDifferences(
...     df,
...     formula="y ~ 1 + C(unit) + C(time)",
...     unit_variable_name="unit",
...     time_variable_name="time",
...     treated_variable_name="treated",
...     treatment_time_variable_name="treatment_time",
...     model=cp.pymc_models.LinearRegression(
...         sample_kwargs={
...             "tune": 100,
...             "draws": 200,
...             "chains": 2,
...             "progressbar": False,
...         }
...     ),
... )

References

Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event Study Designs: Robust and Efficient Estimation. Review of Economic Studies.

Methods

StaggeredDifferenceInDifferences.algorithm()

Run the experiment algorithm: fit model, predict counterfactuals, and aggregate effects.

StaggeredDifferenceInDifferences.effect_summary(*)

Generate a decision-ready summary of causal effects for Staggered Difference-in-Differences.

StaggeredDifferenceInDifferences.fit(*args, ...)

Fit the underlying model.

StaggeredDifferenceInDifferences.generate_report(*)

Generate a self-contained HTML report for this experiment.

StaggeredDifferenceInDifferences.get_plot_data(...)

Recover the data of an experiment along with the prediction and causal impact information.

StaggeredDifferenceInDifferences.get_plot_data_bayesian([...])

Get plotting data for Bayesian model.

StaggeredDifferenceInDifferences.get_plot_data_ols()

Get plotting data for OLS model.

StaggeredDifferenceInDifferences.input_validation()

Validate the input data and parameters.

StaggeredDifferenceInDifferences.plot(*[, ...])

Plot the staggered difference-in-differences event study.

StaggeredDifferenceInDifferences.plot_group_time(*)

Plot cohort-specific ATT(g, t) trajectories.

StaggeredDifferenceInDifferences.print_coefficients([...])

Ask the model to print its coefficients.

StaggeredDifferenceInDifferences.set_maketables_options(*)

Set optional maketables rendering options for this experiment.

StaggeredDifferenceInDifferences.summary([...])

Print summary of main results.

Attributes

idata

Return the InferenceData object of the model.

supports_bayes

supports_ols

labels

data

__init__(data, formula, unit_variable_name, time_variable_name, treated_variable_name='treated', treatment_time_variable_name=None, never_treated_value=inf, model=None, event_window=None, reference_event_time=-1, **kwargs)[source]#
Parameters:
Return type:

None

classmethod __new__(*args, **kwargs)#