title |
author |
date |
output |
Lecture 17 Case Methods |
Nick Huntington-Klein |
`r Sys.Date()` |
revealjs::revealjs_presentation |
theme |
transition |
self_contained |
smart |
fig_caption |
reveal_options |
solarized |
slide |
true |
true |
true |
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, warning=FALSE, message=FALSE)
theme_set(theme_gray(base_size = 15))
## Recap
- So far, the method we've used have relied on access to a bunch of treated individuals, so we can average over them to get an idea of the outcome conditional on treatment
- But many treatments we might be interested in are only applied to *one* group in *one case*
- What can we do?
- We can use *case methods*
## The Problem
- Why is this likely to be a special problem?
- For one, we have very little data! Only one treated group means that you only have a handful of pre- and post-treatment observations
- It's harder to believe we can abstract away "what's different" about that group when we can't average over a few groups
- Can't control for group-specific stuff
- Can aggregate multiple case effect estimates
## Approaches
- Event studies (the case alone)
- Synthetic control (the case vs. control groups)
## Event Studies
- In an event study, you are really asking the question "what changed when treatment went into effect?"
- At its core it's just a before/after comparison
- With some bells and whistles
## The Basic Problem
- Our diagram looks like difference-in-differences wbut without the control group
- If anything else is changing over time, we have a back door!
```{r, dev='CairoPNG', echo=FALSE, fig.width=6,fig.height=3.75}
dag % tidy_dagitty()
ggdag_classic(dag,node_size=20) +
## Hmm...
- Our actual goal in identifying an effect using before/after data is to figure out *what after would have looked like* for the treated group if no treatment had occurred
- DID says "let's see how a different group changed and assume the treated group would have changed in the same way"
- Event studies say "let's see how the treated group was changing before and use that to predict how it would have continued to change"
## Another way to think about it
- Rather than thinking of event studies as being like DID but without a control group, we can think of them as a RDD but with time as a running variable, and a cutoff when treatment was introduced
- One simple form of event study estimation actually uses the same regression equation
$$ Y = \beta_0 + \beta_1Time + \beta_2After + \beta_3Time\times After $$
with $\beta_2$ as the event study estimate
## The Time Issue
- RDD works pretty good, so event studies seem pretty solid too, right? Ehhh...
- The assumption that nothing else changes at the cutoff is a bit harder to believe for time as a running variable. Things change over time!
- Some studies, especially in high-frequency finance data, make this more believable by looking at *really tiny time intervals*
- If you have second-to-second data, and your bandwidth is like 10 minutes on either side, then yeah, probably nothing else changed at the cutoff
- Of course this also requires that the effect of treatment has to kick in real quick!
## Forecasting
- More often in areas where event studies are used to identify causal effects (and not just, say, check the plausibility of a DID design), they use *forecasting* tools
- We want to predict the counterfactual after treatment as if no treatment had occurred
- So... let's look at the trend before treatment and assume that continues!
- (a) estimate a time-series model using pre-treatment data, (b) forecast post-treatment data, (c) compare to outcomes
## Forecasting
- Plus, we know how to calculate confidence intervals for forecasts, so that's an easy way to see if the outcomes we observe are statistically significant
- Doing this properly requires effectively using forecasting tools in time series analysis, which is not something I'm going to delve into super deeply in this class
- So let's just see an simple example simulation and an example study
## Simulation
```{r, echo = TRUE}
tb %
mutate(After = Time >= 80) %>%
# Increasing overall trend, plus treatment effect
mutate(Y = .1*Time - .001*(Time-20)^2 + 4*After + rnorm(100))
# Add an AR(1) process to the data
for (i in 2:100) {
tb$Y[i] %
filter(!After) %>%
model(AR(Y ~ Time + I(Time^2) + order(1)))
predictions % filter(!After), h = 20)
effect % filter(After) %>% pull(Y)) - mean(predictions$.mean)
## Simulation
- Clearly the time-series modeling could already use some work here but you get the idea!
```{r, echo = FALSE}