# Causal DAGS for Quasi-Experiments

This page provides an overview of causal Directed Acyclic Graphs (DAG’s) for some of the most common quasi-experiments. It takes inspiration from a paper by Steiner *et al.* [2017], and the books by Cunningham [2021] and Huntington-Klein [2021], and readers are encouraged to consult these sources for more details.

Before we take a look at randomized controlled trials (RCTs) and quasi-experiments, let’s first consider the concept of confounding. Confounding occurs when a variable (or variables) causally influence both the treatment and the outcome and is very common in observational studies. This can lead to biased estimates of the treatment effect (the causal effect of \(Z \rightarrow Y\)). The following causal DAG illustrates the concept of confounding. Note that the confounder is written as a vector because there may be multiple confounding variables, \(\mathbf{X}=x_1, x_2,x_3\).

One way to tell that our estimate of the causal relationship \(Z \rightarrow Y\) may be biased is the presence of a backdoor path, \(Z \leftarrow \mathbf{X} \rightarrow Y\). This path type is known as a “fork”. Because \(\mathbf{X}\) is a common cause of \(Z\) and \(Y\), any observed statistical relation between \(Z\) and \(Y\) may be due to the confounding effect of \(\mathbf{X}\).

Backdoor paths are problematic because they introduce *statistical associations* between variables that do not reflect the true causal relationships, potentially leading to biased causal estimates. For example, if we ran a regression of the form `y ~ z`

, and observe a main effect of \(Z\) on \(Y\), we have no way of knowing if this represents a true causal impact of \(Z\) on \(Y\), or if it is due to the confounding effect of \(\mathbf{X}\).

One approach is to “close the backdoor path” by conditioning on the confounding variables. Practically, this could involve including confounders \(\mathbf{X}\) as a covariate in a regression model such as: `y ~ z + x₁ + x₂ + x₃`

. Without explaining why, the coefficient for the main effect of \(Z\) would now be an unbiased estimate of the *causal* effect of \(Z \rightarrow Y\).

However, unless we are very sure that we have accurate measures of *all* confounding variables (maybe there is an \(x_4\) that we don’t know about or couldn’t measure), it is still possible that our estimate of the causal effect is biased.

This leads us to Randomized Controlled Trials (RCTs) which are considered the gold standard for estimating causal effects. One reason for this is that we (as experimenters) intervene in the system by assigning units to treatment by random assignment. Because of this intervention, any causal influence of the confounders upon the treatment \(\mathbf{X} \rightarrow Z\) is broken - treamtent is now soley determined by the randomisation process, \(R \rightarrow T\). The following causal DAG illustrates the structure of an RCT.

The new variable \(R\) represents the random assignment of units to the treatment group. This means that the treatment effect \(Z \rightarrow Y\) can be estimated without bias.

## Instrumental Variables

In quasi-experiments, we cannot randomly assign subjects to treatment groups. So confounders \(\mathbf{X}\) will still influence treatment assignment. In the instrumental variable (IV) approach, the causal effect of \(Z \rightarrow Y\) is identifiable if we have an IV that causally influences the treatment \(Z\) but not the outcome \(Y\).

Let’s try to get some intuition of why having the \(IV\) helps:

The presence of \(\mathbf{X}\) is a confounder because it influences both \(Z\) and \(Y\).

But the \(IV\) helps overcome this confounding because it is not influenced by \(\mathbf{X}\).

Any association between the \(IV\) and \(Y\) must be through the treatment \(Z\).

This means that the \(IV\) can be used to estimate the causal effect of \(Z \rightarrow Y\), without being confounded by \(\mathbf{X}\). Informally, the \(IV\) causes some variation in the treatment \(Z\) that is not due to \(\mathbf{X}\), and this variation can be used to estimate the causal effect of \(Z \rightarrow Y\).

Readers are referred to Steiner *et al.* [2017], Cunningham [2021] or Huntington-Klein [2021] for a more in-depth discussion of the IV approach from the causal DAG perspective.

## Interrupted Time Series

A causal DAG for interrupted time series quasi-experiment is given in Chapter 17 of Huntington-Klein [2021], though they are labelled as Event Studies. These kinds of studies are suited to situations where an intervention is made at a given point in time at which we move from untreated to treated. Typically, we consider situations where there are a ‘decent’ number of observations over time. Here’s the causal DAG - note that \(\text{time}\) represents all the things changing over time such as the time index as well as time-varying predictor variables.

What we want to understand is the causal effect of the treatment upon the outcome, \(Z \rightarrow Y\). But we have a back door path between \(Z\) and \(Y\) which will make this hard, \(Z \leftarrow \text{after treatment} \leftarrow \text{time} \rightarrow Y\).

Note

Below is an attempt to explain one way that we can deal with this. Though it is a bit of a brain-twister and can take some time to get your head around. Thanks to Nick Huntington-Klein for some clarification in this twitter thread.

One approach we can use is:

We want to close the backdoor path, and one way to do this is to split the dataset into two parts: pre-treatment and post-treatment. By fitting a model only to the pre-treatment data, we have removed any variation in \(\text{after treatment}\) (all values are \(0\)), so there is now no variation in \(Z\) caused by \(\text{time}\). This is one way to close a backdoor path, and means that a model fitted to this data (e.g. \(Y_{\text{pre}} \sim f(\text{time}_{\text{pre}})\)) will not be biased by the backdoor path.

However, our goal is to estimate the causal effects of the treatment \(Z \rightarrow Y\), but we have just removed any variation in \(Z\) and it does not appear in the aforementioned model, \(Y_{\text{pre}} \sim f(\text{time}_{\text{pre}})\), so our work is not done. One way to deal with this is to use the model to predict what would have happened in the post-treatment era if no treatment had been given. If we make the assumption that nothing would have changed in the absence of treatment, then this will be an unbiased estimate of the counterfactual. By comparing the counterfactual with the observed post-treatment data, we can estimate the treatment effect \(Z \rightarrow Y\). By focussing only on the post-treatment data we are looking at empirical outcomes \(Y_\text{post}\) which are affected by treatment \(Z = 1\), but have closed the back door because all \(\text{after treatment} = 1\). The final comparison (subtraction) between the counterfactual estimate and the observed post-treatment data gives us the estimated treatment effect.

## Propensity Score Weighting

In this exposition we follow the presentation of Steiner *et al.* [2017]. The idea they discuss is that we should conceive of the propensity score adjustment techniques as a primarily an offset aimed at balancing the existing degree of confounding. The focus is on recovering the condition of **strong ignorability** such that \(Y(1), Y(0) \perp\!\!\!\!\perp Z | X\). This constraint is phrased in terms of potential outcomes \(Y(0), Y(1)\), which we won’t define here, but basically we’re saying the outcomes are independent of the treatment when we condition on the covariates \(X\) which determine selection effects. Achieving this status removes the backdoor path between the measured covariates \(X\) and the treatment \(Z\) thereby giving us license to causal conclusions. They emphasise this point in that the PS (propensity score) is a collider variable we can use to disentangle the confounding influence of the covariates \(X\) influencing selection into the treatment.

“This general result is obtained because the PS

itselfis a collider variable and, thus, conditioning on the PS offsets the confounding relation \(X \rightarrow Z\) regardless of the choice of a specific PS design— matching, stratification, or weighting” -pg 176 “Graphical Models for Quasi-experimental Designs”

However, we have to be wary that the design assumes all relevant variables are measured in \(X\), it cannot account for unmeasured confounding. In this way, we try to recover the conditions of an RCT using PS but need to be wary of unmeasured confounding.

One nice feature of this set up is that we can evaluate the claim of **strong ignorability** because it implies that \(T \perp\!\!\!\perp X | PS(X)\) and this ensures the covariate profiles are balanced across the treatment branches conditional on the propensity score. This is a testable implication of the postulated design! Balance plots and measures are ways in which to evaluate if the offset achieved by your propensity score has worked. It is crucial that PS serve as a balancing score, if the measure cannot serve as a balancing score the collision effect can add to the confounding bias rather than remove it.

## Difference in Differences

Difference in Difference studies involve comparing the change in outcomes over time between a treatment and control group. The causal DAG for this is given in Chapter 18 of Huntington-Klein [2021]:

Note

For our explanation below, we will assume we are dealing with the simplest case of a two-group, two-time period design, the so called “classical” 2\(\times\)2 difference-in-differences design.

Our goal is to estimate the causal effect of the treatment on the outcome, \(Z \rightarrow Y\), but now we have *two* backdoor paths:

\(Z \leftarrow \text{time} \rightarrow Y\)

\(Z \leftarrow \text{group} \rightarrow Y\)

From a regression point of view, both \(time\) and \(group\) are binary variables. In this situation, treatment is given to the treatment group (\(\text{group}=1\)) at time \(\text{time}=1\).

The causal effect of the treatment upon the outcome is typically estimated by fitting a regression model of the form `y ~ time + group + time:group`

. The interaction term `time:group`

captures the causal effect of \(Z \rightarrow Y\).

We can note that this interaction term \(\text{time} \times \text{group}\) encodes the values of \(Z\), which as we said above, is equal to 1 for only the treatment group at time 1. So another way to think about the inclusion of an interaction effect is that we are simply conditioning on all the observed data (\(Z\), \(\text{time}\), \(\text{group}\), \(Y\)) to estimate the causal effect of \(Z \rightarrow Y\).

Warning

Achieving an unbiased estimate is strongly dependent upon the parallel trends assumption. That is, we assume that the treatment and control groups would have followed the same trajectory (over time) in the absence of treatment. This is a strong assumption and should be carefully considered when interpreting the results of a difference-in-differences study. In the case of the classic 2\(\times\)2 design we cannot assess the validity of this assumption empirically, so it is important to consider the plausibility of this assumption in the context of the particular example.

## Synthetic Control

Warning

While many texts cover the synthetic control method, they typically do not provide a causal DAG-based treatment. So this section is pending - we hope to update it soon.

## Regression Discontinuity

The regression discontinuity design is similar to the interrupted time series design, but rather than the the treatment being at a specific point in time, treatment is based on a cutoff value \(\lambda\) along some running variable \(RV\). This running variable could be a test score, age, spatial location, etc. The running variable may also influence the outcome \(RV \rightarrow Y\). The running variable may also be associated with a set of variables \(\mathbf{X}\) that influence the outcome, \(RV - - - - \mathbf{X} \rightarrow Y\).

We can see from the data generating graph (left) that the \(RV\) is a confounding variable as it influences both the treatment \(Z\) and the outcome \(Y\).

If we tried to identify the causal effect of \(Z \rightarrow Y\) by conditioning on the running variable (\(RV=rv\)), we would eliminate any variation in \(Z\) or \(Y\) caused by \(RV\). And because \(Z\) is constant for any given value of \(RV\), then the \(Z \rightarrow Y\) path would disappear and we could not estimate the causal effect.

Identification of the causal effect of \(Z \rightarrow Y\) is done with a limiting graph (right). The \(RV\) node is replaced by a subset of the data where \(RV\) is close to the cutoff value \(\lambda\), hence the name “limiting graph” and the symbol \(RV \rightarrow \lambda\).

In the limit, this eliminates variation in the running variable and so breaks the \(RV \rightarrow Y\) path. The causal effect of \(Z \rightarrow Y\) can be estimated by comparing the outcomes of units just above and just below the cutoff value \(\lambda\).

Readers are referred to Steiner *et al.* [2017] and Chapter 6 of Cunningham [2021] who discuss limiting graphs in more detail. Chapter 20 of Huntington-Klein [2021] also covers regression discontinuity designs, but presents simplified (and non-kosher, in his own words) causal DAG.