1 Star 0 Fork 76

liguozhu/causalinferencebook_stata

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
chapter15_2019.do 6.80 KB
一键复制 编辑 原始数据 按行查看 历史
/***************************************************************
Stata code for Causal Inference: What If by Miguel Hernan & Jamie Robins
Date: 10/10/2019
Author: Eleanor Murray
For errors contact: ejmurray@bu.edu
***************************************************************/
/***************************************************************
PROGRAM 15.1
Estimating the average causal effect within levels of confounders
under the assumption of effect-measure modification by smoking
intensity ONLY
Data from NHEFS
Section 15.1
***************************************************************/
clear
use "nhefs.dta"
/*generate smoking intensity among smokers product term*/
gen qsmkintensity = qsmk*smokeintensity
* regression on covariates, allowing for some effect modfication
regress wt82_71 qsmk qsmkintensity c.smokeintensity##c.smokeintensity sex race c.age##c.age ///
ib(last).education c.smokeyrs##c.smokeyrs ib(last).exercise ib(last).active c.wt71##c.wt71
*Output the estimated mean difference between quitting and not quitting value when smoke intensity = 5 cigarettes/ day*
lincom 1*_b[qsmk] + 5*1*_b[qsmkintensity]
*Output the estimated mean difference between quitting and not quitting value when smoke intensity = 40 cigarettes/ day*
lincom 1*_b[qsmk] + 40*1*_b[qsmkintensity]
/* regression on covariates, with no product terms*/
regress wt82_71 qsmk c.smokeintensity##c.smokeintensity sex race c.age##c.age ///
ib(last).education c.smokeyrs##c.smokeyrs ib(last).exercise ib(last).active c.wt71##c.wt71
/***************************************************************
PROGRAM 15.2
Estimating and plotting the propensity score
Data from NHEFS
Section 15.2
***************************************************************/
/*Fit a model for the exposure, quitting smoking*/
logit qsmk sex race c.age##c.age ib(last).education c.smokeintensity##c.smokeintensity ///
c.smokeyrs##c.smokeyrs ib(last).exercise ib(last).active c.wt71##c.wt71
/*Estimate the propensity score, P(Qsmk|Covariates)*/
predict ps, pr
/*Check the distribution of the propensity score*/
bys qsmk: summarize ps
/*Return extreme values of propensity score: note, for Stata versions 15 and above, start by installing extremes*/
/*
ssc install extremes
*/
extremes ps seqn
bys qsmk: extremes ps seqn
/*Plotting the estimated propensity score*/
histogram ps, width(0.05) start(0.025) frequency fcolor(none) lcolor(black) lpattern(solid) addlabel addlabopts(mlabcolor(black) mlabposition(12) mlabangle(zero)) ///
ytitle(No. Subjects) ylabel(#4) xtitle(Estimated Propensity Score) xlabel(#15) ///
by(, title(Estimated Propensity Score Distribution) subtitle(By Quit Smoking Status)) ///
by(, legend(off)) by(qsmk, style(compact) colfirst) subtitle(, size(small) box bexpand)
/***************************************************************************
PROGRAM 15.3
Stratification and outcome regression using deciles of the propensity score
Data from NHEFS
Section 15.3
NOTE: Stata decides borderline cutpoints differently from SAS,
so, despite identically distributed propensity scores,
the results of regression using deciles are not an exact match with the book.
***************************************************************************/
/*Calculation of deciles of ps*/
xtile ps_dec = ps, nq(10)
by ps_dec, sort: summarize ps
/*Stratification on PS deciles, allowing for effect modification*/
/*Note: stata compares qsmk 0 vs qsmk 1, so the coefficients are reversed relative to the book*/
by ps_dec: ttest wt82_71, by(qsmk)
/*Regression on PS deciles, with no product terms*/
regress wt82_71 qsmk ib(last).ps_dec
/***************************************************************
PROGRAM 15.4
Standardization and outcome regression using the propensity score
Data from NHEFS
Section 15.3
***************************************************************/
clear
use "nhefs.dta"
/*Estimate the propensity score*/
logit qsmk sex race c.age##c.age ib(last).education c.smokeintensity##c.smokeintensity ///
c.smokeyrs##c.smokeyrs ib(last).exercise ib(last).active c.wt71##c.wt71
predict ps, pr
/*Expand the dataset for standardization*/
expand 2, generate(interv)
expand 2 if interv == 0, generate(interv2)
replace interv = -1 if interv2 ==1
drop interv2
tab interv
replace wt82_71 = . if interv != -1
replace qsmk = 0 if interv == 0
replace qsmk = 1 if interv == 1
by interv, sort: summarize qsmk
/*Regression on the propensity score, allowing for effect modification*/
regress wt82_71 qsmk##c.ps
predict predY, xb
by interv, sort: summarize predY
quietly summarize predY if(interv == -1)
matrix input observe = (-1,`r(mean)')
quietly summarize predY if(interv == 0)
matrix observe = (observe \0,`r(mean)')
quietly summarize predY if(interv == 1)
matrix observe = (observe \1,`r(mean)')
matrix observe = (observe \., observe[3,2]-observe[2,2])
matrix rownames observe = observed E(Y(a=0)) E(Y(a=1)) difference
matrix colnames observe = interv value
matrix list observe
/*bootstrap program*/
drop if interv != -1
gen meanY_b =.
save nhefs_std, replace
capture program drop bootstdz
program define bootstdz, rclass
u nhefs_std, clear
preserve
bsample
/*Create 2 new copies of the data. Set the outcome AND the exposure to missing in the copies*/
expand 2, generate(interv_b)
expand 2 if interv_b == 0, generate(interv2_b)
replace interv_b = -1 if interv2_b ==1
drop interv2_b
replace wt82_71 = . if interv_b != -1
replace qsmk = . if interv_b != -1
/*Fit the propensity score in the original data (where qsmk is not missing) and generate predictions for everyone*/
logit qsmk sex race c.age##c.age ib(last).education c.smokeintensity##c.smokeintensity ///
c.smokeyrs##c.smokeyrs ib(last).exercise ib(last).active c.wt71##c.wt71
predict ps_b, pr
/*Set the exposure to 0 for everyone in copy 0, and 1 to everyone for copy 1*/
replace qsmk = 0 if interv_b == 0
replace qsmk = 1 if interv_b == 1
/*Fit the outcome regression in the original data (where wt82_71 is not missing) and generate predictions for everyone*/
regress wt82_71 qsmk##c.ps
predict predY_b, xb
/*Summarize the predictions in each set of copies*/
summarize predY_b if interv_b == 0
return scalar boot_0 = r(mean)
summarize predY_b if interv_b == 1
return scalar boot_1 = r(mean)
return scalar boot_diff = return(boot_1) - return(boot_0)
drop meanY_b
restore
end
*Then we use the 'simulate' command to run the bootstraps as many times as we want*
*Start with reps(10) to make sure your code runs, and then change to reps(1000) to generate your final CIs*
simulate EY_a0=r(boot_0) EY_a1 = r(boot_1) difference = r(boot_diff), reps(500) seed(1): bootstdz /
matrix pe = observe[2..4, 2]'
matrix list pe
bstat, stat(pe) n(1629)
estat bootstrap, p
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/liguozhu813/causalinferencebook_stata.git
git@gitee.com:liguozhu813/causalinferencebook_stata.git
liguozhu813
causalinferencebook_stata
causalinferencebook_stata
master

搜索帮助