In this course, we will work in two primary paradigms:
The model we choose will depend on what type of response variable we have.
\[ y = f(x) + \epsilon \]
For simple linear regression, the model
\[ Y = f(X) + \epsilon \] has the following qualities:
parameters
statistics
Model \[ Y = \beta_0 + \beta_1\cdot X + \epsilon, \, \epsilon \sim N(0, \sigma_\epsilon) \] Estimate
\[ \hat{y} = \hat{\beta_0} + \hat{\beta_1}\cdot x \]
observed - expected \[ y_i - \hat{y}_i \]
Let’s look at the relationship between the car’s weight (in pounds) and the MSRP (in dollars).
How could we describe this relationship?
Given how the data looked in the scatterplot we saw above, it seems reasonable to choose a simple linear regression model. We can then fit it using R. Here’s how we fit a simple linear regression model:
To fit the model, R is minimizing the sum of the squared residuals,
\[ SSE = \sum_{i=1}^n (y_i - \hat{y}_i)^2, \qquad \hat{\sigma}_\epsilon = \sqrt{\frac{SSE}{n-2}} \] Sometimes the SSE is also called the Residual Sum of Squares (RSS).
One thing I want you to be able to do is write out the equation of the line.
\[ \widehat{\text{MSRP}} = -8383.10 + 11.51\cdot \text{weight} \]
\[ \widehat{\text{MSRP}} = -8383.10 + 11.51\cdot \text{weight} \]
\[ \widehat{\text{MSRP}} = -8383.10 + 11.51\cdot \text{weight} \]
I like to use ‘recipe’ sentences for interpretations. Here are two:
Slope: “For a 1-[unit] increase in [explanatory variable], we would expect to see a [\(\hat{\beta}_1\)]-[unit] [increase/decrease] in [response variable].”
Intercept: “If the value of [explanatory variable] was 0, we would expect [response variable] to be [\(\hat{beta}_0\)] [units].”
\[ \widehat{\text{MSRP}} = -8383.10 + 11.51\cdot \text{weight} \] How would we interpret the coefficients from this model? Does it make sense to interpret the intercept?
The first time I fit the model, all I wanted were the coefficients
But in this class, we are going to need to go a lot further. So, we need to store our model object in R and give it a name. I use kind of bad names (m1
, m2
, etc) you could choose a better name.
We’re using the assignment operator to store the results of our function into a named object.
I’m using the assignment operator <-
, but you can also use =
. As with many things, I’ll try to be consistent, but I sometimes switch between the two.
Now, we want to move on to assessing and using our model. Typically, this means we want to look at the model output (the fitted coefficients, etc). If we run summary()
on our model object we can look at the output.
Notice that the coefficents are the same, they just show up in a column rather than printed out like a row. Let’s write out the model equation again.