title |
header-includes |
output |
High-Dimension and Endogeneity |
\usepackage{bbm, lmodern,amsmath,amssymb,enumitem,listings,enumerate} |
|
html_document |
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction
This presents empirical applications of the linear instrumental variables (IV) model with many covariates $(p^x >>n)$ and many instruments $(p^z >>n)$ based on the estimators analysed in Belloni et al. (2012b) and Chernozhukov et al. (2015b). The main package in the hdm R package avalaible at https://cran.r-project.org/web/packages/hdm/index.html. In particular, we strongly encourage to read the vignette https://cran.r-project.org/web/packages/hdm/vignettes/hdm.pdf.
## Simulation Study
These simulations illustrate two points:
* the naive post-selection estimator suffers from a large regularization bias;
* the cross-fitting estimator trades off a large bias for a smaller MSE compared to the immunized estimator that uses the whole sample.
```{r}
library("ggplot2")
library("gridExtra")
library("MASS")
library("mnormt")
library(hdm)
library(AER)
library(car)
library("Rcpp")
```
We reproduce the DGP of \cite{ChernozhukovHansenSpindler2015}: namely i.i.d observations $(Y_i,D_i,Z_i,X_i)^n_{i=1}$, where the number of controls is set to 200, the number of instruments to 150, the number of observations to 202.
```{r}
### Simulation parameters
set.seed(135711)
p_x = 200 ## number of controls
p_z = 150 ## number of instruments
n = 202 ## total sample size
K = 2 # nb folds
```
$$\begin{align}
Y_i = &\tau_0 D_i + X_i^{'} \beta_0 + 2 \varepsilon_i \\\\
D_i = &X_i^{T} \gamma_0 + Z_i^{'} \delta_0 + U_i\\\\
Z_i =& \Pi X_i + 0.125 \zeta_i,
\end{align}$$
where
$$ \left(\begin{array}{c} \varepsilon_i \\\\ u_i \\\\ \zeta_i \\\\ x_i \end{array} \right) \sim \mathcal{N} \left( 0 , \left(
\begin{array}{cccc}1 & 0.6 & 0 & 0\\\\ 0.6 & 1 & 0 & 0\\\\ 0 & 0 & I_{p^{z}} & 0\\\\ 0 & 0 & 0& \Sigma \end{array} \right) \right)
$$
where:
* $\Sigma$ is a $p^{x} \times p^{x}$ matrix with $\Sigma_{kj} = (0.5)^{|j-k|}$ and $I_{p^{z}}$ the $p^{z} \times p^{z}$ identity matrix.
```{r}
### GENERATE DATA
means 10^(-6))*1 + (rY_x$coefficients[2:(dim(x)[2]+1)]> 10^(-6))*1
sel[sel ==2] 10^(-6))*1
## Do TSLS
x_sel = x[,sel==1]
z_sel = z[,sel_z==1]
if(sum(sel)>0 & sum(sel_z)>0){
ivfit.lm = ivreg(y ~ d + x_sel| z_sel + x_sel)
}else if (sum(sel)==0 & sum(sel_z)>0){
ivfit.lm = ivreg(y ~ d | z_sel)
}
se = -1)
Result[1,] = -1)
Result[2,] = -1)
Result[3,] = -1)
Result[4,] = -1)
Result[5,] = -1)
Result[6,]