200字范文,内容丰富有趣,生活中的好帮手!
200字范文 > 【Python计量】内生性问题 工具变量法与二阶段最小二乘法2SLS

【Python计量】内生性问题 工具变量法与二阶段最小二乘法2SLS

时间:2022-10-02 13:01:07

相关推荐

【Python计量】内生性问题 工具变量法与二阶段最小二乘法2SLS

我们以伍德里奇《计量经济学导论:现代方法》的”第15章 工具变量估计与两阶段最小二二乘法“的案例15.5为例,使用美国女性教育回报数据MORZ,学习工具变量法的Python实现。

变量:被解释变量log(wage)为工资的对数,解释变量educ为受正式教育年数,exper为工作经验。

构建模型如下:

log(wage)=β0+β1educ+β2exper+β3exper2+ulog(wage)=\beta_0+\beta_1educ+\beta_2exper+\beta_3exper^2+u log(wage)=β0​+β1​educ+β2​exper+β3​exper2+u

上式仅考虑了职业女性自身受正式教育年数的影响,存在遗漏变量的情况,引发内生性问题。因此,考虑将父亲的受教育程度fathereduc、母亲的受教育程度mothereduc作为工具变量,fathereduc、mothereduc应该与educ相关,而与u无关。

一、内生性问题与二阶段最小二乘法

什么是内生性?

内生性是指解释变量和误差项ε存在相关性,导致最小二乘估计的参数β有偏、不一致。

什么情况下会产生内生性?

遗漏重要解释变量解释变量与被解释变量互为因果变量测量误差,观测到的x,y与真实的x和y存在一定的差距

工具变量的要求

举个例子:对于简单的 y=α+βx+εy=\alpha+\beta x+\varepsilony=α+βx+ε

如果扰动项与xxx相关,我们可以设置一个工具变量zzz,使得zzz满足以下两个条件:

(1)相关性:zzz与xxx相关, Cov(x,z)≠0Cov(x,z)\neq0Cov(x,z)=0

(2)外生性:z与扰动项无关。Cov(ε,z)=0Cov(\varepsilon,z)=0Cov(ε,z)=0

工具变量法的实现

工具变量法一般通过“二阶段最小二乘法”(2SLS,Two Stage Least Square) 来实现,其中的两个阶段是:

(1)求 x 对 z 的回归,得到一个x的拟合值;

(2)求 y 对 x 拟合值的回归,得到β\betaβ,由于此阶段的回归中,x 的拟合值与扰动项不相关(OLS的正交性),所以可以得到一致的估计量。

简单来说,2SLS 在回归的第一阶段,把 x分成了两部分,一部分是x的拟合值,另一部分是与扰动项相关的部分;然后在第二阶段中求 y 对 x 拟合值的回归,也就是对消去内生性部分的 x 的回归,故可以得到一致的估计。

二、工具变量法的Python实现

(一)准备数据

import wooldridge as wooimport pandas as pdmroz = woo.dataWoo('mroz')#去除缺失值mroz = mroz.dropna(subset=['lwage'])

(二)工具变量法

1、采用statsmodels进行2SLS回归

(1)内生变量对工具变量做回归,获得内生变量拟合值

educ=π0+π1exper+β2exper2+β3mothereduc+β4fathereduc+veduc=\pi_0+\pi_1exper+\beta_2exper^2+\beta_3mothereduc+\beta_4fathereduc+v educ=π0​+π1​exper+β2​exper2+β3​mothereduc+β4​fathereduc+v

import statsmodels.formula.api as smf#1阶段回归reg_1st= smf.ols(formula='educ ~ exper + expersq + motheduc + fatheduc',data=mroz)results_1st = reg_1st.fit()mroz['educ_fitted'] = results_1st.fittedvaluesprint(results_1st.summary()

结果如下:

OLS Regression Results ==============================================================================Dep. Variable: educ R-squared: 0.211Model: OLS Adj. R-squared: 0.204Method: Least Squares F-statistic: 28.36Date:Sun, 17 Jul Prob (F-statistic): 6.87e-21Time: 16:34:39 Log-Likelihood:-909.72No. Observations: 428 AIC: 1829.Df Residuals: 423 BIC: 1850.Df Model: 4 Covariance Type: nonrobust ==============================================================================coef std errtP>|t|[0.0250.975]------------------------------------------------------------------------------Intercept9.10260.42721.3400.000 8.264 9.941exper0.04520.0401.1240.262-0.034 0.124expersq -0.00100.001-0.8390.402-0.003 0.001motheduc 0.15760.0364.3910.000 0.087 0.228fatheduc 0.18950.0345.6150.000 0.123 0.256==============================================================================Omnibus: 10.903 Durbin-Watson: 1.940Prob(Omnibus): 0.004 Jarque-Bera (JB):20.371Skew:-0.013 Prob(JB): 3.77e-05Kurtosis: 4.068 Cond. No. 1.55e+03==============================================================================Notes:[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.[2] The condition number is large, 1.55e+03. This might indicate that there arestrong multicollinearity or other numerical problems.

(2)被解释变量对内生变量拟合值做回归

log(wage)=β0+β1educ^+β2exper+β3exper2+ulog(wage)=\beta_0+\beta_1\widehat{educ}+\beta_2exper+\beta_3exper^2+u log(wage)=β0​+β1​educ+β2​exper+β3​exper2+u

#2阶段回归reg_2nd = smf.ols(formula='lwage ~ educ_fitted + exper + expersq',data=mroz)results_2nd = reg_2nd.fit()print(results_2nd.summary())

结果如下:

OLS Regression Results ==============================================================================Dep. Variable: lwage R-squared: 0.050Model: OLS Adj. R-squared: 0.043Method: Least Squares F-statistic: 7.405Date:Sun, 17 Jul Prob (F-statistic): 7.62e-05Time: 16:40:02 Log-Likelihood:-457.17No. Observations: 428 AIC: 922.3Df Residuals: 424 BIC: 938.6Df Model: 3 Covariance Type: nonrobust ===============================================================================coef std errtP>|t|[0.0250.975]-------------------------------------------------------------------------------Intercept 0.04810.4200.1150.909-0.777 0.873educ_fitted0.06140.0331.8630.063-0.003 0.126exper 0.04420.0143.1360.002 0.016 0.072expersq -0.00090.000-2.1340.033-0.002 -7.11e-05==============================================================================Omnibus: 53.587 Durbin-Watson: 1.959Prob(Omnibus): 0.000 Jarque-Bera (JB): 168.354Skew:-0.551 Prob(JB): 2.77e-37Kurtosis: 5.868 Cond. No. 4.41e+03==============================================================================Notes:[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.[2] The condition number is large, 4.41e+03. This might indicate that there arestrong multicollinearity or other numerical problems.

2、采用linearmodels进行2SLS回归

使用linearmodels工具包中的IV2SLS工具,首先需要导入库

from linearmodels.iv import IV2SLS

其次可加载公式,进行2SLS回归

IV2SLS(formula,data)formula:回归方程,形式为:dep~exog+[endog~instr],其中exog表示外生变量,endog表示内生变量,instr表示工具变量

具体到本例,代码如下:

from linearmodels.iv import IV2SLS reg_iv = IV2SLS.from_formula(formula='lwage ~ 1 + exper + expersq + [educ ~ motheduc + fatheduc]',data=mroz)results_iv = reg_iv.fit(cov_type='unadjusted', debiased=True)print(results_iv)

结果如下:

IV-2SLS Estimation Summary==============================================================================Dep. Variable: lwage R-squared: 0.1357Estimator:IV-2SLS Adj. R-squared: 0.1296No. Observations: 428 F-statistic:8.1407Date:Sun, Jul 17 P-value (F-stat)0.0000Time: 16:42:41 Distribution: F(3,424)Cov. Estimator: unadjusted Parameter Estimates==============================================================================Parameter Std. Err.T-stat P-value Lower CI Upper CI------------------------------------------------------------------------------Intercept0.04810.40030.1.9044-0.73880.8350exper0.04420.01343.28830.00110.01780.0706expersq -0.00090.0004 -2.23800.0257-0.0017-0.0001educ 0.06140.03141.95300.0515-0.00040.1232==============================================================================Endogenous: educInstruments: fatheduc, motheducUnadjusted Covariance (Homoskedastic)Debiased: True

三、工具变量相关检验

构建模型如下:

log(wage)=β0+β1educ+β2exper+β3exper2+ulog(wage)=\beta_0+\beta_1educ+\beta_2exper+\beta_3exper^2+u log(wage)=β0​+β1​educ+β2​exper+β3​exper2+u

(一)变量内生性检验

1、将疑是内生变量educeduceduc对外生变量和工具变量做回归,得到残差vvv。

educ=π0+π1exper+β2exper2+β3mothereduc+β4fathereduc+veduc=\pi_0+\pi_1exper+\beta_2exper^2+\beta_3mothereduc+\beta_4fathereduc+v educ=π0​+π1​exper+β2​exper2+β3​mothereduc+β4​fathereduc+v

import statsmodels.formula.api as smf#1阶段回归reg_1st= smf.ols(formula='educ ~ exper + expersq + motheduc + fatheduc',data=mroz)results_1st = reg_1st.fit()mroz['resid'] = results_1st.resid #获得残差

2、在原方程中将残差vvv也作为一个变量加入,用OLS模型检验系数及其显著性,如果vvv的系数显著异于零,则educeduceduc变量是内生的。

log(wage)=β0+β1v+β2educ+β3exper+β4exper2+ulog(wage)=\beta_0+\beta_1v+\beta_2educ+\beta_3exper+\beta_4exper^2+u log(wage)=β0​+β1​v+β2​educ+β3​exper+β4​exper2+u

#2阶段回归reg_2 = smf.ols(formula='lwage~ resid + educ + exper + expersq',data=mroz)results_2 = reg_2.fit()print(results_2.summary())

结果如下:

OLS Regression Results ==============================================================================Dep. Variable: lwage R-squared: 0.162Model: OLS Adj. R-squared: 0.154Method: Least Squares F-statistic: 20.50Date:Sun, 17 Jul Prob (F-statistic): 1.89e-15Time: 17:02:34 Log-Likelihood:-430.19No. Observations: 428 AIC: 870.4Df Residuals: 423 BIC: 890.7Df Model: 4 Covariance Type: nonrobust ==============================================================================coef std errtP>|t|[0.0250.975]------------------------------------------------------------------------------Intercept0.04810.3950.1220.903-0.727 0.824resid0.05820.0351.6710.095-0.010 0.127educ 0.06140.0311.9810.048 0.000 0.122exper0.04420.0133.3360.001 0.018 0.070expersq -0.00090.000-2.2710.024-0.002-0.000==============================================================================Omnibus: 74.968 Durbin-Watson: 1.931Prob(Omnibus): 0.000 Jarque-Bera (JB): 278.059Skew:-0.736 Prob(JB): 4.17e-61Kurtosis: 6.664 Cond. No. 4.42e+03==============================================================================Notes:[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.[2] The condition number is large, 4.42e+03. This might indicate that there arestrong multicollinearity or other numerical problems.

(二)过度识别检测

步骤:

1、用2SLS方法估计方程,得到残差u^\hat{u}u^

2、将u^\hat{u}u^对所有外生变量和工具变量回归,得到R2R^2R2

3、原假设为:所有工具变量与uuu不相关,于是nR2∼Xq2nR^2 \sim X_q^2nR2∼Xq2​,其中qqq是工具变量数量减去内生变量数量。如果nR2nR^2nR2大于Xq2X_q^2Xq2​某个显著性水平的临界值,则拒绝所有变量都是外生的原假设

from linearmodels.iv import IV2SLS import statsmodels.formula.api as smfimport scipy.stats as stats#第一步,用2SLS法估计方程,得到残差reg_iv = IV2SLS.from_formula(formula='lwage ~ 1 + exper + expersq + [educ ~ motheduc + fatheduc]',data=mroz)results_iv = reg_iv.fit(cov_type='unadjusted', debiased=True)#第二步,将残差对所有外生变量和工具变量回归mroz['resid_iv'] = results_iv.residsreg_aux = smf.ols(formula='resid_iv ~ exper + expersq + motheduc + fatheduc',data=mroz)results_aux = reg_aux.fit()#第三步,显著性判断r2 = results_aux.rsquaredn = results_aux.nobsq = 2-1teststat = n * r2pval = 1 - stats.chi2.cdf(teststat, q)print(f'r2: {r2}')print(f'n: {n}')print(f'teststat: {teststat}')print(f'pval: {pval}')

运行结果为:

r2: 0.0008833442569250449n: 428.0teststat: 0.3780713419639192pval: 0.5386372330714363

经过上述步骤我们得到,有n=428条观测数据,p值为0.539,在5%的显著性水平下不能拒绝原假设,即父母的受教育程度通过了过度识别检测,可作为工具变量。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。