generate_staggered_did_data#

causalpy.data.simulate_data.generate_staggered_did_data(n_units=50, n_time_periods=20, treatment_cohorts=None, treatment_effects=None, unit_fe_scale=2.0, time_fe_scale=1.0, sigma=0.5, seed=None)[source]#

Generate synthetic panel data with staggered treatment adoption.

Creates a balanced panel dataset where different cohorts of units receive treatment at different times. Supports dynamic treatment effects that vary by event-time (time relative to treatment).

Parameters:
  • n_units (int) – Total number of units in the panel.

  • n_time_periods (int) – Number of time periods in the panel.

  • treatment_cohorts (dict[int, int] | None) – Dictionary mapping treatment time to number of units in that cohort. Units not assigned to any cohort are never-treated. Default: {5: 10, 10: 10, 15: 10} (3 cohorts of 10 units each, leaving 20 never-treated units).

  • treatment_effects (dict[int, float] | None) – Dictionary mapping event-time (t - G) to treatment effect. Event-time 0 is the first treated period. Default: {0: 1.0, 1: 1.5, 2: 2.0, 3: 2.5} with constant effect of 2.5 for all subsequent periods.

  • unit_fe_scale (float) – Scale of unit fixed effects (drawn from Normal(0, unit_fe_scale)).

  • time_fe_scale (float) – Scale of time fixed effects (drawn from Normal(0, time_fe_scale)).

  • sigma (float) – Standard deviation of idiosyncratic noise.

  • seed (int | None) – Random seed for reproducibility.

Returns:

Panel data with columns: - unit: Unit identifier - time: Time period - treated: Binary indicator (1 if treated at time t, 0 otherwise) - treatment_time: Time of treatment adoption (np.inf for never-treated) - y: Observed outcome - y0: Counterfactual outcome (for validation) - tau: True treatment effect (for validation)

Return type:

pd.DataFrame

Examples

>>> from causalpy.data.simulate_data import generate_staggered_did_data
>>> df = generate_staggered_did_data(n_units=30, n_time_periods=15, seed=42)
>>> df.head()
   unit  time  treated  treatment_time  ...

Notes

The data generating process is:

\[Y_{it} = \alpha_i + \lambda_t + \tau_{it} \cdot D_{it} + \varepsilon_{it}\]

where \(\alpha_i\) is the unit fixed effect, \(\lambda_t\) is the time fixed effect, \(D_{it}\) is the treatment indicator, and \(\tau_{it}\) is the dynamic treatment effect that depends on event-time \(e = t - G_i\).