class: center, middle, inverse, title-slide # Simple Linear Regression ## Introduction ### Prof. Maria Tackett --- class: middle, center ## [Click for PDF of slides](03-slr-intro.pdf) --- ## Topics - Use simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable. -- - Estimate the slope and intercept of the regression line using the least squares method. -- - Interpret the slope and intercept of the regression line. --- ## Movie ratings data The data set contains the "Tomatometer" score (**`critics`**) and audience score (**`audience`**) for 146 movies rated on rottentomatoes.com. <img src="03-slr-intro_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ## Movie ratings data We want to fit a line to describe the relationship between the critics score and audience score. <img src="03-slr-intro_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ## Terminology .pull-left[ The .vocab[response, *Y*], is the variable describing the outcome of interest. <br> The .vocab[predictor, *X*], is the variable we use to help understand the variability in the response. ] .pull-right[ <img src="03-slr-intro_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ] --- ## Regression model A .emph[regression model] is a function that describes the relationship between the response, `\(Y\)`, and the predictor, `\(X\)`. .eq[ `$$\begin{aligned} Y &= \color{black}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{black}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{black}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned}$$` ] --- class: middle, center .pull-left[ .small-box[ `$$\begin{aligned} Y &= \color{purple}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{purple}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{purple}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned}$$` ] ] .pull-right[ <img src="03-slr-intro_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] --- class: middle .pull-left[ .small-box[ `$$\begin{aligned} Y &= \color{purple}{\textbf{Model}} + \color{blue}{\textbf{Error}} \\[5pt] &= \color{purple}{\mathbf{f(X)}} + \color{blue}{\boldsymbol{\epsilon}} \\[5pt] &= \color{purple}{\boldsymbol{\mu_{Y|X}}} + \color{blue}{\boldsymbol{\epsilon}} \\[5pt] \end{aligned}$$` ] ] .pull-right[ <img src="03-slr-intro_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] --- ## Simple linear regression When we have a quantitative response, `\(Y\)`, and a single quantitative predictor, `\(X\)`, we can use a .vocab[simple linear regression] model to describe the relationship between `\(Y\)` and `\(X\)`. .eq[ `$$\begin{aligned} Y &= \mathbf{\beta_0 + \beta_1 X} + \epsilon \end{aligned}$$` ] `$$\boldsymbol{\beta}_1: \text{Slope} \hspace{20mm} \boldsymbol{\beta}_0: \text{Intercept}$$` --- class: middle, center .eq[ `$$\Large{\hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X}$$` ] --- class: middle .eq[ `$$\Large{\hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X}$$` ] --- class: middle, center How do we choose values for `\(\hat{\beta}_1\)` and `\(\hat{\beta}_0\)`? <img src="03-slr-intro_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> --- ## Residuals <img src="03-slr-intro_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> .eq[ `$$\text{residual} = \text{observed} - \text{predicted} = y - \hat{y}$$` ] --- ## Least squares line -- - The residual for the `\(i^{th}\)` observation is `$$e_i = \text{observed} - \text{predicted} = y_i - \hat{y}_i$$` -- - The .vocab[sum of squared residuals] is `$$e^2_1 + e^2_2 + \dots + e^2_n$$` -- - The .vocab[least squares line] is the one that minimizes the sum of squared residuals --- ## Estimating the slope .eq[ `$$\large{\hat{\beta}_1 = r \frac{s_Y}{s_X}}$$` ] -- .pull-slight-left[ .small-box-work[ `$$\begin{aligned} &s_X = 30.169 \\[5pt] &s_Y = 20.024 \\[5pt] &r = 0.781 \end{aligned}$$` ] ] -- .pull-more-right[ .small-box-work[ `$$\begin{aligned}\hat{\beta}_1 &= 0.781 \times \frac{20.024}{30.169} \\[10pt] &= \mathbf{0.518}\end{aligned}$$` ] ] --- ## Estimating the intercept .eq[ `$$\large{\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}}$$` ] -- .pull-slight-left[ .small-box-work[ `$$\begin{aligned}&\bar{x} = 60.850 \\[5pt] &\bar{y} = 63.877 \\[5pt] &\hat{\beta}_1 = 0.518 \end{aligned}$$` ] ] -- .pull-more-right[ .small-box-work[ `$$\begin{aligned}\hat{\beta}_0 &= 63.877 - 0.518 \times 60.850 \\[10pt] &= \mathbf{32.296}\end{aligned}$$` ] ] --- ## Interpreting slope & intercept .eq[ `$$\hat{\text{audience}} = 32.296 + 0.518 \times \text{critics}$$` ] <br> -- .vocab[Slope]: For every one point increase in the critics score, we expect the audience score to increase by 0.518 points, on average. -- .vocab[Intercept]: If the critics score is 0 points, we expect the audience score to be 32.296 points. --- ## Does it make sense to interpret the intercept? -- ✅ **Interpret the intercept if** - the predictor can feasibly take values equal to or near zero. - there are values near zero in the data. -- <br> 🛑 Otherwise, don't interpret the intercept! --- ## Recap -- - Used simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable. -- - Used the least squares method to estimate the slope and intercept. -- - We interpreted the slope and intercept. - .vocab[Slope]: For every one unit increase in `\(x\)`, we expect y to change by `\(\hat{\beta}_1\)` units, on average. - .vocab[Intercept]: If `\(x\)` is 0, then we expect `\(y\)` to be `\(\hat{\beta}_0\)` units.