About the Course

Instructor

  • Dr. Amelia McNamara (amelia.mcnamara@stthomas.edu, OSS 407, 651-962-5391).
  • Student hours: Monday 1-2 pm, Wednesday 11 am - noon, and by appointment, in OSS 407.

Description

Applied linear regression models. Simple linear regression; introduction, inferences, diagnostics, remedial measures, simultaneous inference. Matrix approach in linear regression. Multiple regression; inference, remedial measures, extra sums of squares, partial determinations, standardized models, use of indicator and mixed variables, polynomial regression, model selection and validation, diagnostics, remedial measures, multicollinearity and effects, autocorrelation. Single and multi-factor analysis of variance: analysis of factor level means, interactions, inferences, diagnostics and remedial measures. A statistical package must be used as tool. Optional topics may include: logistic regression, design of experiments, and forecasting.

Prerequisites:

One of the following, STAT 201, STAT 220, STAT 333, MATH 303

Course Goals

  • Gain a basic understanding of regression models, including the assumptions they make and when to use them.
  • Learn to apply regression models to real-world data sets using R, and interpret the regression output.

Textbooks

One textbook is required:

  • STAT 2: Building Models for a World of Data, by Cannon, Cobb, Hartlaub, Legler, Lock, Moore, Rossman, and Witmer. W.H. Freeman and Company, 2012. Note that you do not need the online access!

Policies

Attendance

Attendance is recommended, but not required. You are an adult and make your own priorities. If you are not in class, I will assume you understand the material that was covered. In a case where you may need an extended absense and feel it will impact your learning (e.g. an illness, death in the family, conference, etc), please let me know so we can find a way for you to make up the material.

Inclusive classroom

Because data is collected by and about humans, it necessarily encodes aspects of our proclivities and biases. As a result, this course will likely touch upon difficult topics related to race, gender, inequality, class, and oppression. We each come into this class with different perspectives that can be shared to enhance our understanding of these issues. I ask that you enter these conversations with respect, curiosity, and cultural humility. You should be open to alternative perspectives and willing to revise beliefs that are based on misinformation. As a general rule, your ideas and experiences can always be shared during these conversation but please refrain from dismissing the experiences of others. Personal attacks of any kind will not be tolerated.

Please plan to treat me and your classmates with respect. This includes things like arriving at class on time, coming in quietly if you are late, and focusing on the task at hand, as well as using your classmates’ preferred name and pronouns.

Collaboration

Much of this course will operate on a collaborative basis, and you are expected and encouraged to work together with a partner or in small groups to study, complete homework assignments, and prepare for exams. However, every word that you write must be your own. Copying and pasting sentences, paragraphs, or blocks of code from another student or the internet is not acceptable and will receive no credit. No interaction with anyone but the instructor is allowed on any exams or quizzes. All students are bound by the university Rules of Conduct. Cases of dishonesty, plagiarism, etc., will be reported to the dean.

Disability Statement

Academic accommodations will be provided for qualified students with documented disabilities including but not limited to mental health diagnoses, learning disabilities, Attention Deficit Disorder, Autism, chronic medical conditions, visual, mobility, and hearing disabilities. Students are invited to contact the Disability Resources office about accommodations early in the semester. Appointments can be made by calling 651-962-6315 or in person in Murray Herrick, room 110. For further information, you can locate the Disability Resources office on the web at http://www.stthomas.edu/enhancementprog/.

Assignments

  1. Homework [20%]: There will be several problem sets over the course of the semester. Problem sets will involve computational assignments in R with written explanations. You must complete all of your homework assignments in R Markdown. All homework will be submitted electronically by midnight (11:59 pm) of the due date. Late assignments will lose points at the rate of 20% per day.

  2. Technical report [30%]: You will work on a term project in a small group over the course of the semester. This is an opportunity for you to demonstrate your understanding of the material and put it into practice. The main output of your work will be a technical report, which we will work on in stages.

  3. Project presentation [5%]: The second component of the term project is an oral presentation of your work at the end of the semester. This is an opportunity to practice public speaking, and to share what you have done with your classmates.

  4. Participation [5%]: Engagement with your group work (both in class and during the project). You are expected in fully participate. If you do so, you should recieve full credit for this portion of the grade.

  5. Exams [40%]: There will be two closed-book exams in this course. You may bring a calculator and one piece of paper of hand-written notes (double-sided) to the exam. After the exam, you will have the opportunity to raise your exam score by submitting corrections for any questions you got wrong.

Grading

When grading your written work, I am looking for solutions that are technically correct and reasoning that is clearly explained. The explanation and context of an answer are the most important components. Numerically correct answers, alone, are not sufficient on homework or tests. Neatness and organization are valued, with brief, clear answers that explain your thinking. If I cannot read or follow your work, I cannot give you full credit for it.

Computing

The use of the R statistical computing environment with the RStudio interface is thoroughly integrated into the course. You have two options for using RStudio:

  • The cloud version of RStudio on the web, https://rstudio.cloud/project/62623. The advantage of using the cloud version is that all of your work will be stored in the cloud, where it is automatically saved and backed up. This means that you can access your work from any computer with a web browser and an internet connection. The downside is that you need internet access, and the cloud server has some speed and memory limitations.
  • A desktop version of RStudio installed on your machine, https://www.rstudio.com/products/rstudio/download/. Using the desktop version means you are only limited by the speed and memory of your machine, and you can work offline. The downside to this approach is that your work is only stored locally, but you can get around this by using a version control package like git/GitHub, or keeping your work in a Dropbox folder.

Note that you do not have to choose one or the other – you may use both. However, it is important that you understand the distinction so that you can keep track of your work. Both R and RStudio are free and open-source, and should be installed in the computer labs on campus.

Writing

This class has been designated as a Writing in the Disciplines (WID) course in the Writing Across the Curriculum program at UST. As such, a substantial portion of the semester will be devoted to producing a final project, the main outcome of which will be a technical report. You will select a topic, find and synthesize relevant data, perform a regression analysis, and describe the results you have found. Your ability to communicate technical results is critical to your success as a data analyst. Assignments in this class will place an emphasis on the clarity of your writing, to help you develop and refine this skill.

Tentative Schedule

The following is a brief outline of the course. Please refer to the complete day-to-day schedule for more detailed information.

Week Reading Topic
1 Ch. 0 Introduction to the course
2 Ch. 1 Introduction to modeling. Understanding statistical models, and the four-step process (Choose, Fit, Assess, Use).
3 Ch. 1 Simple Linear Regression. Least squares estimates, fitting models, checking conditions, making transformations.
4 Ch. 2, MDSR Inference in simple linear regression, hypothesis tests and confidence intervals. Regression and correlation.
5 Ch. 3 Multiple Linear Regression. Fitting models, checking conditions, comparing models.
6 Ch. 3 Second-order models. Interaction terms and polynomials. Multicolinearity.
7 Ch. 4 Exam 1 Model Selection. Using nested F-tests and stepwise regression.
8 Ch. 4 Randomization & the bootstrap as methods of inference.
9 Ch. 5 ANOVA. Fitting models, checking conditions, interpretation of results.
10 Ch. 5 Multiple Comparisons. Bonferroni corrections, Fisher’s LSD, Tukey’s HSD.
11 Ch. 9 Logistic Regression. Fitting and interpreting models. Checking conditions, formal inference.
12 Ch. 10 Multiple logistic regression.
Nov. 22-26 Thanksgiving Break
13 Data cleaning and wrangling.
14 Exam 2 and Project Work Time
15 Project Work Time and Project Presentations
Dec. 21 All work due