LOG FOR DISCUSSION/DEMONSTRATION of GRAPHICS AND REJECTION-SAMPLING 2/11/08 ===================================================== GRAPHICS: First consider the case of a data-frame which we called "classfram" giving the names, HWgrade, Final (exam grade), and course letter-grade for 25 members of an undergraduate class. The data-frame, generated as follows: > HWgrade = rbeta(25,3,1)*100 > Final = pmax(0, pmin(rnorm(25, HWgrade + 10, 10),100)) > Glab <- c("A","B","C","D","F") ### NOTE: pmin and pmax are the versions of min and max ### that respectively apply componentwise to pairs of vectors > classfram = data.frame(names=paste("Student",1:25,sep=""), HWgrade = round(HWgrade,1), Final = round(Final,1), Grade= Glab[pmin(floor((110-0.5*(HWgrade+Final))/10),5)]) ### We discussed the steps needed to plot a graph containing ### a scatterplot of Final vs HW grades, with plotting ### character equal to letter-grade. Here they are ### for reference: try them yourself ! > plot(classfram$HWgrade, classfram$Final, xlab="HW avg", ylab="Final Exam grade", type="n", main= "Final Exam vs HW score, with Letter-grade Plotted") > for(i in 1:5) { inds = classfram$Grade == Glab[i] points(classfram$HWgrade[inds], classfram$Final[inds], pch=Glab[i]) } legend(locator(), legend=c(">=90", "80:89", "70:79", "60:69", "<60"), pch=paste(Glab, collapse="")) =========================================================== ACCEPTANCE-REJECTION SAMPLING ### Illustration of sampling of Normal rv generation ### from Cauchy (standard, taken from Robert & Casella): > xv = -3 + (0:600)/6 > cc = max(dnorm(xv)/dcauchy(xv)) [1] 1.520347 > (2/exp(0.5))*sqrt(pi/2) ### max attained at +1, -1 [1] 1.520347 ### So far, we have checked that the max of the #### ratio of densities is bounded by cc ### Now generate Cauchy variates via simply-implemented ### probability integral transform: > ccvec = tan(pi*(runif(10000)-0.5)); uvec <- runif(10000) rvec = dnorm(ccvec)/(cc*dcauchy(ccvec)) > summary(rvec) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 0.2762 0.8419 0.6536 0.9343 1 ### This is the vector of values we use in the accept-reject ### algorithm. The number of variates needed to obtain one ### newly simulated `normal' deviate is the number of ### successive (rvec[i],uvec[i]) pairs until the first index ### where rvec[i] >= uvec[i]: the expected waiting time is in ### theory equal to 1/P(rvec[1] >= uvec[1]) = ### 1/(average value of rvec) = cc = 1.52: we check this below > AccRej = function(xv, rv, uv, ctr) { ### ctr is 1 less than index of first random number ### to use. We must also assume that it is less ### than length(uv)-10 . flag = T while(flag) { ctr = ctr+1 flag = (uv[ctr] > rv[ctr]) } c(xv[ctr], ctr) } > AccRej(ccvec, rvec, uvec, 0) [1] 1.0805 2.0000 > AccRej(ccvec, rvec, uvec, 2) [1] 0.3363053 3.0000000 > AccRej(ccvec, rvec, uvec, 3) [1] -1.884756 4.000000 > AccRej(ccvec, rvec, uvec, 4) [1] 0.8313336 5.0000000 > AccRej(ccvec, rvec, uvec, 5) [1] 0.07258546 6.00000000 > AccRej(ccvec, rvec, uvec, 6) [1] 0.009537398 7.000000000 > AccRej(ccvec, rvec, uvec, 7) [1] 0.2831676 8.0000000 ### Now let's do a lot of these at once in a for-loop > cnew = numeric(1000) ind = 0 for(i in 1:1000) { aux = AccRej(ccvec, rvec, uvec, ind) cnew[i] = aux[1] ind = aux[2] } > ind/1000 [1] 1.556 ### compar with cc = 1.520347 > summary(cnew) Min. 1st Qu. Median Mean 3rd Qu. Max. -3.49000 -0.63420 -0.02721 0.01425 0.65610 3.99100 ### One way to check normality is by means of a histogram > hist(cnew, prob=T) ### Not bad, but not conclusive ### The prob=T parameter ensures the plotting of ### scaled relative frequencies so that the area under the ### piecewise rectangular curve is 1 ### Check normality by overplotting a normal density with ### the same mean and standard dev as cnew > lines(seq(-3.49, 3.99, .075), dnorm(seq(-3.49, 3.99, .075), mean(cnew), sqrt(var(cnew))), lty=4) ### This actually looks pretty good ! ### Another way to create a `probability plot' ## is to plot the cumulative empirical d.f. of the (ordered) ## observations against the cumulative normal cdf values. > plot(pnorm(sort(cnew)), (1:1000)/1000, xlab="Normal cdf values", ylab="Empirical cdf values", type="b", main= "Probability Plot for Simulated Normal Values") abline(0,1) ## Again, this looks VERY persuassively normal !!