Lecture_08_Fixed_Effects.Rmd · 计量经济学/CausalitySlides

title

author

date

output

Lecture 8 Fixed Effects

Nick Huntington-Klein

`r Sys.Date()`

revealjs::revealjs_presentation

theme

transition

self_contained

smart

fig_caption

reveal_options

solarized

slide

true

slideNumber
true

```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE, warning=FALSE, message=FALSE) library(tidyverse) library(stringr) library(dagitty) library(ggdag) library(gganimate) library(ggthemes) library(Cairo) library(gapminder) library(modelsummary) library(fixest) library(ggpubr) theme_set(theme_gray(base_size = 15)) ``` ## Recap - Last time we talked about how controlling is a common way of blocking back doors to identify an effect - We can control for a variable `W` by using our method of using `W` to explain our other variables, then take the residuals - Another form of controlling is using a sample that has only observations with similar values of `W` - Some variables you want to be careful NOT to control for - you don't want to close front doors, or open back doors by controlling for colliders ## Today - Today we'll be starting on our path for the rest of the class, where we'll be talking about standard *methods* for performing causal inference - Different ways of getting identification once we have a diagram! - Our goal here will be to understand these methods *conceptually* and to also figure out some good statistical practices for their use - Our goal is to *understand* these methods and be able to apply a straightforward version of them ## Today - In particular we'll be talking about a method that is commonly used to identify causal effects, called fixed effects - We'll be discussing the *kind* of causal diagram that fixed effects can identify - All of the methods we'll be discussing are like this - they'll only apply to particular diagrams - And so knowing our diagrams will be key to knowing when to use a given method ## The Problem - One problem we ran into last time is that we can't really control for things if we can't measure them - And there are lots of things we can't measure or don't have data for! - So what can we do? ## The Solution - If we observe each person/firm/country *multiple times*, then we can forget about controlling for the actual back-door variable we're interested in - And just control for *person/firm/country identity* instead! - This will control for EVERYTHING unique to that individual, whether we can measure it or not! ## In Practice - Let's do this on the data from the "gapminder" package - This data tracks life expectancy and GDP per capita in many countries over time ```{r, echo=TRUE, eval=FALSE} library(gapminder) data(gapminder) cor(gapminder$lifeExp,log(gapminder$gdpPercap)) ``` ```{r, echo=FALSE, eval=TRUE} data(gapminder) cor(gapminder$lifeExp,log(gapminder$gdpPercap)) ``` ```{r, echo=TRUE} gapminder % group_by(country) %>% mutate(lifeExp.r = lifeExp - mean(lifeExp), logGDP.r = log(gdpPercap) - mean(log(gdpPercap))) %>% ungroup() cor(gapminder$lifeExp.r,gapminder$logGDP.r) ``` ## So What? - This isn't any different, mechanically, from any other time we've controlled for something - So what's different here? - Let's think about what we're doing conceptually ## What's the Diagram? - Why are we controlling for things in this gapminder analysis? - Because there are LOTS of things that might be back doors between GDP per capita and life expectancy - War, disease, political institutions, trade relationships, health of the population, economic institutions... ## What's the Diagram? ```{r, dev='CairoPNG', echo=FALSE, fig.width=5, fig.height=6} dag % tidy_dagitty() ggdag_classic(dag,node_size=20) + theme_dag_blank() ``` ## What's the Diagram? - There's no way we can identify this - The list of back doors is very long - And likely includes some things we can't measure! ## What's the Diagram? - HOWEVER! If we think that these things are likely to be constant within country... - Then we don't really have a big long list of back doors, we just have one: "country" ```{r, dev='CairoPNG', echo=FALSE, fig.width=5, fig.height=3.5} dag % tidy_dagitty() ggdag_classic(dag,node_size=20) + theme_dag_blank() ``` ## What We Get - So what we get out of this is that we can identify our effect even if some of our back doors include variables that we can't actually measure - When we do this, we're basically comparing countries *to themselves* at different time periods! - Pretty good way to do an apples-to-apples comparison! ## Graphically ```{r, dev='CairoPNG', echo=FALSE, fig.width=7, fig.height=5} lgdpmean % mutate(X = .5+.5*(Person-2.5) + rnorm(200)) %>% mutate(Y = -.5*X + (Person-2.5) + 1 + rnorm(200),time="1") %>% group_by(Person) %>% mutate(mean_X=mean(X),mean_Y=mean(Y)) %>% ungroup() #Calculate correlations before_cor % mutate(mean_X=NA,mean_Y=NA,time=before_cor), #Step 2: Add x-lines df %>% mutate(mean_Y=NA,time='2. Figure out between-Individual differences in X'), #Step 3: X de-meaned df %>% mutate(X = X - mean_X,mean_X=0,mean_Y=NA,time="3. Remove all between-Individual differences in X"), #Step 4: Remove X lines, add Y df %>% mutate(X = X - mean_X,mean_X=NA,time="4. Figure out between-Individual differences in Y"), #Step 5: Y de-meaned df %>% mutate(X = X - mean_X,Y = Y - mean_Y,mean_X=NA,mean_Y=0,time="5. Remove all between-Individual differences in Y"), #Step 6: Raw demeaned data only df %>% mutate(X = X - mean_X,Y = Y - mean_Y,mean_X=NA,mean_Y=NA,time=after_cor)) p % tidy_dagitty() ggdag_classic(dag,node_size=20) + theme_dag_blank() ``` ## Varying Over Time - Of course, in this case, we could control for War as well and be good! - Time-varying things doesn't mean that fixed effects doesn't work, it just means you need to control for that stuff too - It always comes down to thinking carefully about your diagram - Fixed effects mainly works as a convenient way of combining together lots of different constant-within-country back doors into something that lets us identify the model even if we can't measure them all ## Fixed Effects in Regression - We can just do fixed effects as we did-subtract out the group means and analyze (perhaps with regression) what's left - We can also include *dummy variables for each group/individual*, which accomplishes the same thing $$ Y = \beta_0 + \beta_1Group1 + \beta_2Group2 + ... + $$ $$ \beta_NGroupN + \beta_{N+1}X + \varepsilon $$ $$ Y = \beta_i + \beta_1X + \varepsilon $$ ## Fixed Effects in Regression - Why does that work? - We want to "control for group/individual" right? So... just... put in a control for group/individual - Of course, like all categorical variables as predictors, we leave out a reference group - But here, unlike with, say, a binary predictor, we're rarely interested in the FE coefficients themselves. Most software works with the mean-subtraction approach (or a variant) and don't even report them! ## Fixed Effects in Regression: Variation - Remember we are isolating *within variation* - If an individual *has* no within variation, say their treatment never changes, they basically get washed out entirely! - A fixed-effects regression wouldn't represent them. And can't use FE to study things that are fixed over time - And in general if there's not a lot of within variation, FE is going to be very noisy. Make sure there's variation to study! ## Fixed Effects in Regression: Notes - It's common to *cluster standard errors* at the level of the fixed effects, since it seems likely that errors would be correlated over time (autocorrelated errors) - It's possible to have *more than one set* of fixed effects. $Y = \beta_i + \beta_j + \beta_1X + \varepsilon$ - But interpretation gets tricky - think through what variation in $X$ you're looking at at that point! ## Coding up Fixed Effects - We will use the **fixest** package - It's very fast, and can be easily adjusted to do FE with other regression methods like logit, or combined with instrumental variables - Clusters at the first listed fixed effect by default ```{r, echo = TRUE, eval = FALSE} library(fixest) m1 % rename(mm1 = `mm[rep(c(TRUE, FALSE), 2100/2), ]`) mm2 % rename(mm2 = `mm[rep(c(FALSE, TRUE), 2100/2), ]`) mmdata % mutate(sentreform = ceiling(sentreform)) %>% na.omit ``` - I've omitted code reading in the data - But in our data we have multiple observations per state ```{r, echo=TRUE} head(mmdata) mmdata % mutate(assaultper1000 = assault/pop1000, robberyper1000 = robbery/pop1000) ``` ## Fixed Effects - We can see how robbery rates evolve in each state over time as states implement reform ```{r, dev='CairoPNG', echo=FALSE, fig.width=8, fig.height=4.5} ggplot(mmdata,aes(x=year,y=robberyper1000, group=state,color=factor(sentreform)))+ geom_line(size=1)+scale_color_colorblind(name="Reform")+ labs(x="Year",y="Robberies per 1000 Population") + theme_pubr() + theme(legend.position = c(.8,.9)) ``` ## Fixed Effects - You can tell that states are more or less likely to implement reform in a way that's correlated with the level of robbery they already had - So SOMETHING about the state is driving both the level of robberies AND the decision to implement reform - Who knows what! - Our diagram has `reform -> robberies` and `reform robberies`, which is something we can address with fixed effects. ## Fixed Effects ```{r, echo=TRUE} sentencing_ols

计量经济学/CausalitySlides

简介

发行版

贡献者

近期动态

计量经济学/CausalitySlides .gitee-modal { width: 500px !important; }

简介

发行版

贡献者

近期动态

搜索帮助

计量经济学/CausalitySlides