Statistics 430 SAS Data Project Guidelines

Stat 430 Data Analysis Term Project Guidelines.     (Due Date: Thursday December 18, 4pm)

       (I). The data project should be based on a dataset which you select,
probably downloaded from some public web source, and which I suggest
ought to have at least n=100 observations, a continuous response variable
Y, and at least several other meaningful continuous or categorical explanatory
X-columns. Ideally, since you will be looking for relationships between the
X and Y columns, the source and subject matter of the data should relate to
a topic about which you have some general knowledge to aid you in
asking and answering meaningful research questions relevant to the data.

       (II). The objective of your data project should be to discover and present
the best fitting regression-type statistical model you can in SAS to explain the Y
responses in your dataset in terms of the X explanatory variables. So at the outset,
you should try to pose questions about the data relationships whose answers
will be interpretable and expressible in clear language as well as a formal model.
A successful project will relate the research questions to a regression-type model,
use techniques developed in the Stat 430 course to build the best such model you
can for the data and to examine the adequacy or goodness of fit of the model, and
finally (maybe very briefly) explain what conclusions your model lead to for the
data you studied.

       (III). It is not required that your data analysis project be "finished" in the
sense of necessarily reaching firm conclusions about a realistic problem, but
you should make every effort to showcase tools learned in the course (of all
kinds: histograms, QQplots, transformations, data-subsetting as necessary,
residuals plots and prediction intervals, standardized residuals and considera-
tion of outliers, ANOVA, and automatic model-selection techniques) and
demonstrate that you have uncovered all the regression-model structure of
the data that was possible with a reasonable amount of effort.

       (IV). While it is permissible to violate the guidelines in (I)-(II) somewhat,
I strongly urge you to discuss your project with me, before investing too much
effort into it, if you know you want to deviate much from them. This is mostly
in order that I can help you avoid certain kinds of data (time series where
successive observations are definitely not independent, or survival data where
many observations are "censored" in the sense of not being observed until the
health outcome of main interest, or categorical response-data) where the main
assumptions of our regression models are not tenable.

       (V). The guideline for how much material to hand in is much like the
"Homework Guideline" below. Do not hand in data or any computations or
pictures you do not explicitly refer to in accompanying text. You must explain
the data problem and model-building and solution in words, with reference to
pictures and numerical exhibits. You should hand in the SAS code as an Appendix,
or email it to me as a text-file: but in either case it should be edited down to the
code that worked to do the analyses and exhibits you are handing in.

Hand in no more than 20 printed pages -- including tables and pictures --- in a
reasonable sized font and spacing.

Here are links for a sample final project assignment, data, and guidelines from a data analysis
course at U of Michigan similar to this one. If you wanted to do this assignment for your project,
that would be OK.


The University of Maryland, College Park has a nationally recognized Code of Academic Integrity, administered by the Student Honor Council. This Code sets standards for academic integrity at Maryland for all undergraduate and graduate students. As a student you are responsible for upholding these standards for this course. It is very important for you to be aware of the consequences of cheating, fabrication, facilitation, and plagiarism. For more information on the Code of Academic Integrity or the Student Honor Council, please click here.

Return to main course page

© Eric V Slud, Sept. 2, 2008.