class: center, middle, inverse, title-slide # Simple Linear Regression ## Conditions ### Prof. Maria Tackett --- class: middle, center ## [Click here for PDF of slides](05-slr-conditions.pdf) --- ## Topics -- - List the conditions for simple linear regression -- - Use plots of the residuals to check the conditions --- ## Movie ratings data The data set contains the "Tomatometer" score (.term[critics]) and audience score (.term[audience]) for 146 movies rated on rottentomatoes.com. <img src="05-slr-conditions_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ## The model `$$\color{red}{\hat{\text{audience}} = 32.316 + 0.519 \times \text{critics}}$$` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 32.316 </td> <td style="text-align:right;"> 2.343 </td> <td style="text-align:right;"> 13.795 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> critics </td> <td style="text-align:right;"> 0.519 </td> <td style="text-align:right;"> 0.035 </td> <td style="text-align:right;"> 15.028 </td> <td style="text-align:right;"> 0 </td> </tr> </tbody> </table> <img src="05-slr-conditions_files/figure-html/unnamed-chunk-5-1.png" width="80%" style="display: block; margin: auto;" /> --- class: middle .eq[ `$$Y|X \sim N(\beta_0 + \beta_1 X, \sigma_\epsilon^2)$$` ] <img src="05-slr-conditions_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- ## Model conditions -- 1. .vocab[Linearity: ]There is a linear relationship between the response and predictor variable. -- 2. .vocab[Constant Variance: ]The variability of the errors is equal for all values of the predictor variable. -- 3. .vocab[Normality: ]The errors follow a normal distribution. -- 4. .vocab[Independence: ]The errors are independent from each other. --- class: middle, center .eq[ `$$\Large{\text{residual}_i = e_i = y_i - \hat{y}_i}$$` ] --- ## Residuals vs. fitted values <img src="05-slr-conditions_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> --- ## Checking linearity -- <img src="05-slr-conditions_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> -- ✅ There is no distinguishable pattern or structure. The residuals are randomly scattered. --- ## ❌ Violation: distinguishable pattern <img src="05-slr-conditions_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- ## Checking constant variance -- <img src="05-slr-conditions_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> -- ✅ The vertical spread of the residuals is relatively constant across the plot. --- ## ❌ Violation: non-constant variance <img src="05-slr-conditions_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- ## Normal quantile plot <img src="05-slr-conditions_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- ## Checking normality -- <img src="05-slr-conditions_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> -- ✅ Points fall along a straight diagonal line on the normal quantile plot. --- ## Checking independence -- - We can often check the independence assumption based on the context of the data and how the observations were collected. -- - If the data were collected in a particular order, examine a scatterplot of the residuals versus order in which the data were collected. <br> -- ✅ Based on available information, the error for one movie does not tell us anything about the error for another movie. --- ## In practice As you check the model conditions, ask if any observed deviation from the model conditions are so great that -- 1️⃣ a different model should be proposed. -- 2️⃣ conclusions drawn from the model should be used with caution. -- ✅ If not, the conditions are sufficiently met and we can proceed with the current model. --- ## Recap -- - Used plots of the residuals to check conditions for simple linear regression: - .vocab[Linearity] - .vocab[Constant Variance] - .vocab[Normality] - .vocab[Independence]