generate_staggered_did_data#
- causalpy.data.simulate_data.generate_staggered_did_data(n_units=50, n_time_periods=20, treatment_cohorts=None, treatment_effects=None, unit_fe_scale=2.0, time_fe_scale=1.0, sigma=0.5, seed=None)[source]#
Generate synthetic panel data with staggered treatment adoption.
Creates a balanced panel dataset where different cohorts of units receive treatment at different times. Supports dynamic treatment effects that vary by event-time (time relative to treatment).
- Parameters:
n_units (
int) – Total number of units in the panel.n_time_periods (
int) – Number of time periods in the panel.treatment_cohorts (
dict[int,int] |None) – Dictionary mapping treatment time to number of units in that cohort. Units not assigned to any cohort are never-treated. Default: {5: 10, 10: 10, 15: 10} (3 cohorts of 10 units each, leaving 20 never-treated units).treatment_effects (
dict[int,float] |None) – Dictionary mapping event-time (t - G) to treatment effect. Event-time 0 is the first treated period. Default: {0: 1.0, 1: 1.5, 2: 2.0, 3: 2.5} with constant effect of 2.5 for all subsequent periods.unit_fe_scale (
float) – Scale of unit fixed effects (drawn from Normal(0, unit_fe_scale)).time_fe_scale (
float) – Scale of time fixed effects (drawn from Normal(0, time_fe_scale)).sigma (
float) – Standard deviation of idiosyncratic noise.
- Returns:
Panel data with columns: - unit: Unit identifier - time: Time period - treated: Binary indicator (1 if treated at time t, 0 otherwise) - treatment_time: Time of treatment adoption (np.inf for never-treated) - y: Observed outcome - y0: Counterfactual outcome (for validation) - tau: True treatment effect (for validation)
- Return type:
pd.DataFrame
Examples
>>> from causalpy.data.simulate_data import generate_staggered_did_data >>> df = generate_staggered_did_data(n_units=30, n_time_periods=15, seed=42) >>> df.head() unit time treated treatment_time ...
Notes
The data generating process is:
\[Y_{it} = \alpha_i + \lambda_t + \tau_{it} \cdot D_{it} + \varepsilon_{it}\]where \(\alpha_i\) is the unit fixed effect, \(\lambda_t\) is the time fixed effect, \(D_{it}\) is the treatment indicator, and \(\tau_{it}\) is the dynamic treatment effect that depends on event-time \(e = t - G_i\).