title |
author |
date |
output |
Lecture 19 Treatment Effect Methods and Review |
Nick Huntington-Klein |
`r Sys.Date()` |
revealjs::revealjs_presentation |
theme |
transition |
self_contained |
smart |
fig_caption |
reveal_options |
solarized |
slide |
true |
true |
true |
|
|
|
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, warning=FALSE, message=FALSE)
library(tidyverse)
library(dagitty)
library(ggdag)
library(gganimate)
library(ggthemes)
library(ggpubr)
library(modelsummary)
library(fixest)
library(Cairo)
theme_set(theme_gray(base_size = 15))
```
## Some Pointers
- Last time we talked about heterogeneous treatment effects and how our methods produce different averages of those effects
- But we don't need to be limited to that!
- There are plenty of methods - many of them new - that let us estimate a *distribution* of treatment effects
- We won't be going super far into detail with them, but I'll mostly just be letting you know they exist and some pointers for looking further
- I'll favor pointers to packages over papers, but if you look in the help files you'll generally find paper citations
## Sorted Effects
- The *sorted effects* method uses covariates to look at variation in the treatment effect, and produces a distribution of treatment effects
- It also lets you see *who* is at each part of the distribution
```{r, echo = TRUE, results='hide'}
library(SortedEffects)
# Data on being denied for a mortgage.
data(mortgage)
# Save the formula to reuse later
fm %
as_tibble() %>%
mutate(Group = row.names(summary(classify)),
Ratio = Most/Least) %>%
select(Group, Most, Least, Ratio) %>%
arrange(Ratio)
```
## Sorted Effects
- Those who were denied for insurance (`denpmi`) had smallest effects of `black`, those who were `single` had the biggest
```{r, echo = FALSE}
knitr::kable(results)
```
## Bayesian Hierarchical Modeling
- A very old method! But it works. An extension of random effects
- Instead of just letting the *constant* vary, let *any* coefficient vary, and give each its own function to vary over controls! Those controls can let the effect vary
$$ Y = \beta_0 + \beta_1X + \varepsilon $$
$$ \beta_0 = \gamma_{00} + \nu_{00} $$
$$ \beta_1 = \gamma_{10} + \gamma_{11}W + \nu_{01} $$
- Terminology difference: "fixed effects" means "coefficients that don't vary"
## Bayesian Hierarchical Modeling
```{r, echo = TRUE}
library(lme4)
# The whole thing would be super slow so for now let's just do a few effects
m % mutate(holdout = runif(n()) > .5)
holdout % filter(holdout)
training % filter(!holdout)
W = training %>% pull(black) %>% as.matrix()
X = training %>% select(p_irat, hse_inc, ccred, mcred, pubrec,
denpmi, selfemp, single, hischl, ltv_med, ltv_high) %>% as.matrix()
Y = training %>% pull(deny) %>% as.matrix()
m % select(p_irat, hse_inc, ccred, mcred, pubrec,
denpmi, selfemp, single, hischl, ltv_med, ltv_high) %>% as.matrix()
indiv_effects % mutate(effect = indiv_effects$predictions)
```
## Causal Forest
```{r, echo = FALSE}
ggplot(as_tibble(holdout), aes(x = effect)) + geom_density() +
theme_pubr() +
labs(x = 'Individual Effect of Black on Denial Probability',
y = 'Density')
```
## Causal Forest
- Who is affected? Let's do a similar test to what **SortedEffects** did (although we could look at it plenty of other ways)
```{r}
holdout %>%
mutate(Range = case_when(
effect <= quantile(effect, .05) ~ 'Bottom',
effect >= quantile(effect, .95) ~ 'Top',
TRUE ~ NA_character_
)) %>%
filter(!is.na(Range)) %>%
group_by(Range) %>%
select(Range, p_irat, black, hse_inc, ccred, mcred, pubrec,
denpmi, selfemp, single, hischl, ltv_med, ltv_high) %>%
summarize(across(.fns = mean)) %>%
pivot_longer(cols = 2:13) %>%
pivot_wider(id_cols = name, values_from = value, names_from = Range) %>%
mutate(Ratio = Top/Bottom) %>%
arrange(Ratio) %>%
knitr::kable()
```
## Treatment Effect Methods
- Anyway, there's some stuff for you to check out!
- Obviously there are zillions of causal-inference methods we don't have time to cover
- Bartik instruments, matrix completion, causal discovery, and so on and so on and so on
- Consider this a good starting place
## Exam Review
- Just a reminder of some stuff we've covered
## Fixed Effects
- If we have data where we observe the same people over and over, we can implement *fixed effects* by controlling for *individual*
- This accounts for everything that's constant within individual. If, for example, "individual" was city, that would include geography, state, founding year, etc.
- Doesn't account for things that vary within individual over time, like `Laws`
## Difference-in-Difference
- Difference-in-Difference applies when you have a group that you can observe both before and after the policy
- You worry that `time` is a confounder, but you can't control for it
- Unless you add a control group that DIDN'T get the policy
- We must be careful to check that parallel trends holds
## Difference-in-Difference
```{r, dev='CairoPNG', echo=FALSE, fig.width=7, fig.height=5.5}
dag % tidy_dagitty()
ggdag_classic(dag,node_size=20) +
theme_dag_blank()
```
## Difference-in-Difference
- Get the before-after difference for both groups
- Then subtract out the difference for the control
```{r, echo=TRUE}
diddata %
mutate(Treated = (Group == "T") & Time == "After") %>%
mutate(Y = 2*(Group == "T") + 1.5*(Time == "After") + 3*Treated + rnorm(5000))
did % group_by(Group,Time) %>% summarize(Y = mean(Y))
before.after.control % tidy_dagitty()
ggdag_classic(dag,node_size=20) +
theme_dag_blank()
```
## Regression Discontinuity
- Estimate by fitting a line that jumps at the cutoff and estimating the jump
- Use local regression and bandwidths to avoid being affected by far-away observations
- "Fuzzy" designs where treatment only jumps partially scale the effect using IV
## Regression Discontinuity
- Expressed well in graphs! Treatment should jump at cutoff. If not perfectly from 0% to 100%, use IV too
```{r, dev='CairoPNG', echo=FALSE, fig.width=7, fig.height=5}
rdddata %
mutate(run = runif(10000)+.03*W) %>%
mutate(treated = run >= .6) %>%
mutate(Y = 2+.01*run+.5*treated+W+rnorm(10000))
ggplot(rdddata,aes(x=run,y=Y,color=treated)) + geom_point()+
geom_vline(aes(xintercept=.6))+
geom_smooth(aes(group = treated), size = 2, color = 'black') +
labs(x='Running Variable',
y='Outcome') +
theme_pubr() +
guides(color = FALSE)
```
## Regression Discontinuity
- Variables other than `Y` and treatment shouldn't jump at cutoff - they should be balanced
```{r, dev='CairoPNG', echo=FALSE, fig.width=7, fig.height=5}
ggplot(rdddata,aes(x=run,y=W,color=treated)) + geom_point()+
geom_vline(aes(xintercept=.6))+
geom_smooth(aes(group = treated), size = 2, color = 'black') +
labs(x='Running Variable',
y='W') +
theme_pubr() +
guides(color = FALSE)
```
## Instrumental Variables
- An instrumental variable affects treatment (relevant) but has no back doors itself or paths to $Y$ except through $X$ (valid)
- We move the no-open-back-doors assumption to the IV rather than the treatment
- We isolate JUST the variation that comes from `Z`. No back doors in that variation! We have a causal effect
- Can conceptually think of it as (or literally apply it to) an experiment where randomization doesn't work perfectly
## Instrumental Variables
```{r, dev='CairoPNG', echo=FALSE, fig.width=7, fig.height=5.5}
dag % tidy_dagitty()
ggdag_classic(dag,node_size=20) +
theme_dag_blank()
```
## Treatment Effects
- There isn't *a* treatment effect. They vary across time, space, individual
- Our methods give us averages - ATE (experiment), ATT (DID), LATE (IV, RDD), variance-weighted (regression w/ controls), etc.
- We must pay close attention to what our design *and estimator* gives us
## That's it!
- In a very condensed way, that's the material we covered!
- I recommend looking back over slides, notes, homeworks