DEMO FOR ESTIMATION METHODS IN COUNTIES DATASET using auxiliary data given in Exercises 5-7 in Ch.3 ===================================================== 9/28/05 (NB 5 was assigned in HW2, 6 and 7 were not.) > counties <- read.table("LOHR/ASCII/counties.dat", header=T, sep=",") > dim(counties) [1] 100 18 > names(counties) [1] "RN" "STATE" "COUNTY" "LANDAREA" "TOTPOP" "PHYSICIA" [7] "ENROLL" "PERCPUB" "CIVLABOR" "UNEMP" "FARMPOP" "NUMFARM" [13] "FARMACRE" "FEDGRANT" "FEDCIV" "MILIT" "VETERANS" "PERCVIET" ### This corresponds to SRS with n=41 from N=3141 counties. ### We are going to take Y = FARMPOP (col. 11), X = LANDAREA (col.4) ### and we are told that t_x = 3,536,278 sq. mi. ### To begin: here are the three estimators (ybar.s, ybar.r, ybar.reg) ### we have been considering: > xbar = 3536278/3141 regmod <- lm(FARMPOP ~ LANDAREA, data=counties) Bhat <- mean(counties[,11])/mean(counties[,4]) > ybar.vec <- c(ybar.s = mean(counties[,11]), ybar.r = Bhat*xbar, ybar.reg = c(regmod$coef %*% c(1, xbar))) ybar.s ybar.r ybar.reg 1146.870 1366.462 1137.205 ### They differ, but now let's produce estimated Standard Errors ### from SRS formulas > SEvec <- sqrt(c(SE.ys = var(counties[,11]) , SE.yr = var(counties[,11]-Bhat*counties[,4]), SE.yreg = var(regmod$residuals ))*(1-100/3141)/100) SE.ys SE.yr SE.yreg 104.9943 178.6889 104.8168 ## Puzzling that SE is actually worse for the ratio estimator. ## To see why, first check correlation: > cor(counties[,4],counties[,11]) [1] -0.05811006 ## On the other hand, we recall that the factor by which ## (SE.ys)^2 is multiplied to get (SE.yreg)^2 is > 1-summary(regmod)$r.squared [1] 0.9966232 ## RECALL THAT THE THEORY OF THE CHAPTER SAYS THAT THE REGRESSION ESTIMATOR HAS MSE THAT IS ALWAYS LESS THAN OR EQUAL EITHER THAT OF YBAR.S OR YBAR.R . ## Now let's get the 95% CI's for Ybar by the three ## methods, to examine the overlap: > CImat <- matrix(rbind(ybar.vec - 1.96*SEvec,ybar.vec + 1.96*SEvec), nr=2, dimnames=list(c("Lower","Upper"),c("s","r","reg"))) > CImat s r reg Lower 941.0813 1016.232 931.7636 Upper 1352.6587 1716.693 1342.6456 ## So the 1st and 3rd intervals are essentially identical, ## and the ratio-estimated intervals strictly contains them. SUMMARY: > ybar.vec ### These are the 3 estimators ybar.s ybar.r ybar.reg ### for the avg FARMPOP attribute 1146.870 1366.462 1137.205 > SEvec ### These are the 3 SE estimators SE.ys SE.yr SE.yreg 104.9943 178.6889 104.8168 > CImat ### These are the 95% CI's s r reg ## for the target FARMPOP avg. Lower 941.0813 1016.232 931.7636 Upper 1352.6587 1716.693 1342.6456