Heteroskedasticity (Chapter 8)

As usual, set your working directory, remove any unnecessary environments and load the foreign library.

setwd('...')
rm(list=ls())
library('foreign')

Download the files we are going to use into the working directory.

download.file('http://fmwww.bc.edu/ec-p/data/wooldridge/wage1.dta','wage1.dta',mode='wb')
download.file('http://fmwww.bc.edu/ec-p/data/wooldridge/gpa3.dta','gpa3.dta',mode='wb')
download.file('http://fmwww.bc.edu/ec-p/data/wooldridge/crime1.dta','crime1.dta',mode='wb')
download.file('http://fmwww.bc.edu/ec-p/data/wooldridge/hprice1.dta','hprice1.dta',mode='wb')
download.file('http://fmwww.bc.edu/ec-p/data/wooldridge/401ksubs.dta','401ksubs.dta',mode='wb')
download.file('http://fmwww.bc.edu/ec-p/data/wooldridge/smoke.dta','smoke.dta',mode='wb')
download.file('http://fmwww.bc.edu/ec-p/data/wooldridge/mroz.dta','mroz.dta',mode='wb')
download.file('http://fmwww.bc.edu/ec-p/data/wooldridge/gpa1.dta','gpa1.dta',mode='wb')

Example 8.1
After loading the data and generating the dummy variables from chapter 7 estimate the model in the already familiar way.

wage1<-read.dta('wage1.dta')
marrmale<-as.numeric(wage1$female==0 | wage1$married==1)
marrfem<-as.numeric(wage1$female==1 | wage1$married==1)
singfem<-as.numeric(wage1$female==1 | wage1$married==0)

lm.8.1<-lm(lwage ~ marrmale + marrfem + singfem + educ + 
             exper + expersq + tenure + tenursq, data=wage1)
summary(lm.8.1)

To test for heteroskedasticity we use functions from the and packages. Download, install and load them. The use the command coeftest() to obtain the heteroskedasticity robust estimates of the standard errors. The first part of coeftest() contains the result of the familiar estimation. The second part tells R how to calculate the heteroskedasticity robust standard errors. Basically, you could just enter the first part and R would do the rest. However, since the standard procedure of coeftest() would not give the same results as in the book, we have to specify them a bit. Thus, we use vcovHC() which, again, requires the results of the basic model and a specification of the type of robust standard errors that should be calculated. In our case we want a simple White standard error which is indicated by “HC0”. Other, more sophisticated methods are described in the documentation.

  # To test for heteroskedasticity we use the lmtest and sandwich package
  # install.packages('lmtest')
  # install.packages('sandwich')
library('lmtest')
library('sandwich')

coeftest(lm.8.1, vcov=vcovHC(lm.8.1,type='HC0'))

Example 8.2

gpa3<-read.dta('gpa3.dta')
lm.8.2<-lm(cumgpa ~ sat + hsperc + tothrs + female + black + white,
           data=gpa3,subset=(term==2))
summary(lm.8.2)
coeftest(lm.8.2, vcov=vcovHC(lm.8.2,type='HC0'))

For a heteroskedasticity robust F-test we perform a Wald-test. The procedure is similar to obtaining the coefficients’ standard errors. For the usual F-test estimate the restricted and unrestricted models and put their results into the anova() command which will print the F-statistic.
For the Wald-test you do the same, but put the estimations results into the waldtest() command and, as a third part of the function, you specify the method used to model heteroskedasticity.

lm.8.2.2<-lm(cumgpa ~ sat + hsperc + tothrs + female,
             data=gpa3,subset=(term==2))

  # Usual F-statistic
anova(lm.8.2,lm.8.2.2)
  # Heteroskedasticity robust F-statistic
waldtest(lm.8.2,lm.8.2.2,vcov=vcovHC(lm.8.2,type='HC0'))

Example 8.3
The first part of the example is similar to above. Estimate the model and print its summary and the heteroskedasticity robust standard errors.

# Load data and calculate the squared variable
crime1<-read.dta('crime1.dta')
avgsensq<-crime1$avgsen^2

lm.8.3<-lm(narr86 ~ pcnv + avgsen + avgsensq + ptime86 + qemp86 +
             inc86 + black + hispan, data=crime1)
summary(lm.8.3)
coeftest(lm.8.3, vcov=vcovHC(lm.8.3,type='HC0'))

For the LM-test estimate the restricted model and regress its residuals on the variables of the unrestricted model. Obtain R-Squared and multiply it by the number of used observations to get the LM-value. Last, get the p-value.

lm.8.3.2<-lm(narr86 ~ pcnv + ptime86 + qemp86 +
             inc86 + black + hispan, data=crime1)
lm.8.3.2u<-lm(lm.8.3.2$residuals ~ pcnv + avgsen + avgsensq + ptime86 + qemp86 +
                inc86 + black + hispan, data=crime1)
summary(lm.8.3.2u)$r.squared*2725
1-pchisq(3.4626,2)

Example 8.4
For the Breusch-Pagan test estimate the model and obtain its squared residuals. Regress the latter on the variables of the model.

hprice1<-read.dta('hprice1.dta')
lm.e8.17<-lm(price ~ lotsize + sqrft + bdrms, data=hprice1)
summary(lm.e8.17)

  # Squared residuals
usq.17<-(summary(lm.e8.17)$residual)^2 
lm.e8.17u<-lm(usq.17 ~ lotsize + sqrft + bdrms, data=hprice1)
summary(lm.e8.17u)

  # LM-test
lm.e8.17u.2<-lm(usq.17 ~ 1, data=hprice1)
lm.e8.17u.2u<-lm(summary(lm.e8.17u.2)$residuals ~ lotsize + sqrft + bdrms, data=hprice1)
summary(lm.e8.17u.2u)$r.squared*88
1-pchisq(14.09239,3)

  # Using logarithms
lm.e8.18<-lm(lprice ~ llotsize + lsqrft + bdrms, data=hprice1)
summary(lm.e8.18)
  # Breusch-Pagan test
usq.18<-(summary(lm.e8.18)$residual)^2
lm.e8.18u<-lm(usq.18 ~ llotsize + lsqrft + bdrms, data=hprice1)
summary(lm.e8.18u)

  # LM-test
summary(lm.e8.18u)$r.squared*88
1-pchisq(4.223248,3)

Example 8.5
For the White test estimate the model, obtain its squared residuals, fitted values and squared fitted values and regress the first on the latter ones. Then, get the R-Squared and multiply it by the number of used observations to get the LM-statistic and the p-value.

hprice1<-read.dta('hprice1.dta')

lm.e8.18<-lm(lprice ~ llotsize + lsqrft + bdrms, data=hprice1)
ressq<-lm.e8.18$residuals^2
fitted<-lm.e8.18$fitted.values
fittedsq<-lm.e8.18$fitted.values^2
rsq<-summary(lm(ressq ~ fitted + fittedsq))$r.squared
rsq*88
1-pchisq(3.447286,2)

Example 8.6
For weighted least squares you can still use R’s lm() command, but you have to tell R how to weigh the observations with respect to the variance of the coefficients. This is achieved by specifying “weights” as 1/h, where in our case h is income. Now the model can be estimated and since we have somehow ruled out heteroskedasticity by specifying “weights”, we can use the usual summary() command to get heteroskedasticity robust results and do not need to use coeftest().

ksubs<-read.dta('401ksubs.dta')

lm.8.6.1<-lm(nettfa ~ inc, data=ksubs, subset=(fsize==1))
coeftest(lm.8.6.1, vcov=vcovHC(lm.8.6.1,type='HC0'))

lm.8.6.2<-lm(nettfa ~ inc, weights=1/inc, data=ksubs, subset=(fsize==1))
summary(lm.8.6.2)

age.25sq<-(ksubs$age-25)^2
lm.8.6.3<-lm(nettfa ~ age.25sq + male + e401k, data=ksubs, subset=(fsize==1))
coeftest(lm.8.6.3, vcov=vcovHC(lm.8.6.3,type='HC0'))

lm.8.6.4<-lm(nettfa ~ inc + age.25sq + male + e401k, 
             weights=1/inc, data=ksubs, subset=(fsize==1))
summary(lm.8.6.4)

Example 8.7

smoke<-read.dta('smoke.dta')

lm.8.7<-lm(cigs ~ lincome + lcigpric + educ + age + agesq + restaurn, data=smoke)
summary(lm.8.7)

  # Breusch-Pagan
summary(
  lm.8.7<-lm(
    lm.8.7$residuals^2 ~ lincome + lcigpric + educ + 
             age + agesq + restaurn,data=smoke
    )
  )
summary(lm.8.7u)$r.squared*807
1-pchisq(summary(lm.8.7u)$r.squared*807,6)

For feasible GLS you obtain the logarithm of squared residuals from the basic model. Then you regress it on the model’s independent variables and calculate the exponential of fitted values from that regression which gives hhat. Then you estimate the basic model again, but with the weights 1/hhat.

lres.u<-log(lm.8.7$residuals^2)
lm.8.7gls.u<-lm(lres.u ~ lincome + lcigpric + educ + age + agesq + restaurn, data=smoke)
hhat<-exp(lm.8.7gls.u$fitted.values)
lm.8.7gls<-lm(cigs ~ lincome + lcigpric + educ + age + agesq + restaurn,
           weights=1/hhat, data=smoke)
summary(lm.8.7gls)

Table 8.2

  # Recall the first column of the table
summary(lm.8.6.4)
  # Column 2
coeftest(lm.8.6.4, vcov=vcovHC(lm.8.6.4,type='HC0'))

Example 8.8
With respect to linear probability models the procedure is rather similar to the first examples: estimate the model and use coeftest() to get heteroskedasticity robust estimates of the standard errors.

mroz<-read.dta('mroz.dta')

lm.8.8<-lm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6,
           data=mroz)
summary(lm.8.8)
coeftest(lm.8.8, vcov=vcovHC(lm.8.8,type='HC0'))

Example 8.9
Since linear probability models automatically suffer from heteroskedasticity, we have to somehow control for that. The first part of the estimation is already familiar.

gpa1<-read.dta('gpa1.dta')

  # Generate dummy variable vector which is 1 when either the 
  # father or the mother or both were at college.
parcoll<-as.numeric(gpa1$fathcoll==1 | gpa1$mothcoll)

lm.8.9<-lm(PC ~ hsGPA + ACT + parcoll, data=gpa1)
summary(lm.8.9)
coeftest(lm.8.9, vcov=vcovHC(lm.8.9,type='HC0'))

To deal with heteroskedasticity obtain the fitted values from the basic model and check whether all of them are neither negative nor above unity. I plotted a histogram for that. You might also lock at min() and max() or functions like that. Then calculate hhat as described in the book and use its inverse (=1/hhat) for the “weights” specification in the final regression.

hist(lm.8.9$fitted.values) # Fitted values are neither negative nor above unity.
hhat<-lm.8.9$fitted.values*(1-lm.8.9$fitted.values) # Calculate hhat (h=yhat*(1-yhat))
lm.8.9wls<-lm(PC ~ hsGPA + ACT + parcoll, weights=1/hhat, data=gpa1)
summary(lm.8.9wls)

For a further very interesting article on dealing with heteroskedasticity in R I recommend this page.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s