Solution to the Take-Home Make-up, 11/20/07 =========================================== (1) Here we have > Nh = c(5500,3500, 1000) ph = c(.03, .06, .10) pqh = ph*(1-ph)*Nh/(Nh-1) > ybar = sum(Nh*ph)/sum(Nh) [1] .0475 (a) 2*1.96*sqrt((1-300/10000)*ybar*(1-ybar)/300) [1] 0.04741225 ##SRS random sample CI width ## in (b), allocation is: 300*Nh/1e4 = 165 105 30 (b) 2*1.96*sqrt(sum((1-300/10000)*(Nh/1e4)^2*pqh/(300*Nh/1e4))) [1] 0.04715945 (c) > nopt = 300*Nh*sqrt(pqh)/sum(Nh*sqrt(pqh)) ## in (c), allocation is: nopt = 136.00 120.49 43.50 > 2*1.96*sqrt(sum((1-nopt/Nh)*(Nh/1e4)^2*pqh/nopt)) [1] 0.04610368 ### Neyman alloc stratified sample CI width (2) S_t^2 = sum_i (M_i*ybar_{Ui} - (K/N)*ybar_U})^2/(N-1) = sum_i M_i^2 ybar_{Ui}/(N-1) - Mbar*K*ybarU^2/(N-1) and S_{yU}^2 = SST/(K-1). Now we are given Mbar = K/N=3.5, and SSW/((K-N)*S_{yU}^2) = (SSW/SST)*(K-1)/(K-N) = 0.3 and sum_i M_i*(M_i-Mbar)* ybar_{Ui}^2 = 1.44*SST*(N-1)/(K-1). Therefore SSB = SST*(1-0.3*(K-N)/(K-1) and by definition SSB = sum_i M_i (ybar_{Ui}-ybar_U)^2 = sum_i M_i ybar_{Ui}^2 - K*ybarU^2 To solve the problem, note that (N-1) * S_t^2 = sum_i M_i*(M_i-Mbar)*ybar_{Ui}^2 + Mbar*sum_i M_i*(ybar_{Ui}-ybarU)^2 and we can now fill in the information: (N-1) * S_t^2 = 1.44*SST*(N-1)/(K-1) + Mbar*SST*(1-0.3*(K-N)/(K-1) = SST*(1.44*(N-1)/(K-1) + (K/N)*(1-0.3*(K-N)/(K-1))) or (a) S_t^2/S_{yU}^2 = (K-1)*S_t^2/SST = 1.44 + (K/N)*(K-1-.3*(K-N))/(N-1) = 1.44 + (K/N)*(.7*K+.3*N-1)/(N-1) ## After removing -1 terms due to large N, we obtain the ## approximate ratio S_t^2/S_{yU}^2 = 1.44 + 3.5*(.7*3.5+.3) = 11.065 For (b): For single stage cluster sample, with N1 clusters, variance in estimating population mean = N^2*(1-N1/N)*S_t^2/(N1*K^2) approximately = S_t^2/(N1*Mbar^2) = S_t^2/(N1*3.5^2). For SRSWOR sample of N2 units, approx variance = S_{yU}^2/N2. To equalize the two variances: we set S_t^2/(N1*3.5^2) = S_{yU}^2/N2 and because of the calculation done in part (a), we conclude that 11.065 = S_t^2/S_{yU}^2 = 3.5^2 * N1/N2 Since each of the N1 clusters sampled costs 25, and each of the N2 persons sampled costs 10, we find that the ratio of costs needed for the two plans is 25*N1/(10*N2) = 2.5*11.065/3.5^2 = 2.258163 ### This means that the plan to sample clusters is more costly by a factor of 2.26. (3) Formula is : [ (300/20)* sum over i in S1 of 3*that_i plus 1200/30 * sum over i in S2 of sampled incomes y_i] divided by the total number 3000 of units. The variance of this estimator is obtained using formula (5.22) [using S_t^2 = MSB*M, or (5.34)] for the first stratum and our standard SRS formula for the second, as [300^2*(1-20/300)*MSB_{y,1}*6/20 + (300/20)*(1-2/6)*(6^2/2)*(1/5)*SSW_{y,1} + 1200^2*(1-30/1200)*S_{y,2}^2/30]/3000^2 We note that SST_{y,1} = 1799*S_{y,1}^2 = 16191, and SSW_{y,1}/SST_{y,1} = (1800-300)*MSW_{y,1}/(1799*S_{y,1}^2) = (1500/1799)*0.2 = 0.16676, so that SSB_{y,1}/SST_{y,1} = 1 - 0.16676 = 0.83324 and MSB_{y,1} = 0.83324* 16191 /299 = 45.120 SSW_{y,1} = 0.16676* 16191 = 2700.011 Plugging everything in, we get SE = (1/3000)*sqrt(300^2*(1-20/300)*45.120*6/20 + (300/20)*(1-2/6)*(6^2/2)*.2*2700.011 + 1200^2*(1-30/1200)*16/30) [1] 0.4693997 ### Recalling units are $10000 this might be reasonable. (4) NOTE: in this propblem the sample proportion of counties with >500 farms is 160/300 = .53333, while we are told that the national proportion of such counties = 1484/3078 = .48213. As in all small-domain problems, we re-define the attribute to be multiplied by the indicator of domain membership: W_i = Y_i*I(i in D), and we use ratio estimator over denom which is total of V_i = I(i in D). Target is ratio of frame totals sum W_i/ sum V_i. Estimator = 50468635/160 = 315429 Standard Error = sqrt((1/n - 1/N)*(1/Vbar)^2*SampVar(W_i-Bhat*V_i)) = sqrt((1/300-1/3078)*(3078/1484)^2*(1/299)* (2.733721e+13-160*315429^2)) 22231.26 So CI = 315429 + 1.96*c(-1,1)*22231.26 = (271856, 359002)