class: center, middle, inverse, title-slide

# Simple Linear Regression
## Introduction
### Prof. Maria Tackett

---

class: middle, center

## [Click for PDF of slides](03-slr-intro.pdf)

---

## Topics

- Use simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable.

--

- Estimate the slope and intercept of the regression line using the least squares method.

--

- Interpret the slope and intercept of the regression line.

---

## Movie ratings data

The data set contains the "Tomatometer" score (**`critics`**) and audience score (**`audience`**) for 146 movies rated on rottentomatoes.com.

<img src="03-slr-intro_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" />

---

## Movie ratings data

We want to fit a line to describe the relationship between the critics score and audience score.

<img src="03-slr-intro_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" />

---

## Terminology

.pull-left[
The .vocab[response, *Y*], is the variable describing the outcome of interest.

<br>

The .vocab[predictor, *X*], is the variable we use to help understand the variability in the response.
]

.pull-right[
<img src="03-slr-intro_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" />
]

---

## Regression model

A .emph[regression model] is a function that describes the relationship between the response, `\(Y\)`, and the predictor, `\(X\)`.

.eq[
`$$\begin{aligned} Y &= \color{black}{\textbf{Model}} + \text{Error} \\[8pt]
&= \color{black}{\mathbf{f(X)}} + \epsilon \\[8pt]
&= \color{black}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned}$$`
]

---

class: middle, center

.pull-left[
.small-box[
`$$\begin{aligned} Y &= \color{purple}{\textbf{Model}} + \text{Error} \\[8pt]
&= \color{purple}{\mathbf{f(X)}} + \epsilon \\[8pt]
&= \color{purple}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned}$$`
]
]

.pull-right[
<img src="03-slr-intro_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" />
]

---

class: middle

.pull-left[
.small-box[
`$$\begin{aligned} Y &= \color{purple}{\textbf{Model}} + \color{blue}{\textbf{Error}} \\[5pt]
&= \color{purple}{\mathbf{f(X)}} + \color{blue}{\boldsymbol{\epsilon}} \\[5pt]
&= \color{purple}{\boldsymbol{\mu_{Y|X}}} + \color{blue}{\boldsymbol{\epsilon}} \\[5pt]
 \end{aligned}$$`
]
]

.pull-right[
<img src="03-slr-intro_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" />
]

---

## Simple linear regression

When we have a quantitative response, `\(Y\)`, and a single quantitative predictor, `\(X\)`, we can use a .vocab[simple linear regression] model to describe the relationship between `\(Y\)` and `\(X\)`.
.eq[
`$$\begin{aligned} Y &= \mathbf{\beta_0 + \beta_1 X} + \epsilon \end{aligned}$$`
]

`$$\boldsymbol{\beta}_1: \text{Slope} \hspace{20mm} \boldsymbol{\beta}_0: \text{Intercept}$$`

---

class: middle, center

.eq[
`$$\Large{\hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X}$$`
]

---

class: middle

.eq[
`$$\Large{\hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X}$$`
]

---

class: middle, center

How do we choose values for `\(\hat{\beta}_1\)` and `\(\hat{\beta}_0\)`?

<img src="03-slr-intro_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" />

---

## Residuals

<img src="03-slr-intro_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" />

.eq[
`$$\text{residual} = \text{observed} - \text{predicted} = y - \hat{y}$$` 
]

---

## Least squares line

--

- The residual for the `\(i^{th}\)` observation is

`$$e_i = \text{observed} - \text{predicted}
= y_i - \hat{y}_i$$`

--

- The .vocab[sum of squared residuals] is

`$$e^2_1 + e^2_2 + \dots + e^2_n$$`

--

- The .vocab[least squares line] is the one that minimizes the sum of squared residuals

---

## Estimating the slope
.eq[
`$$\large{\hat{\beta}_1 = r \frac{s_Y}{s_X}}$$`
]

--

.pull-slight-left[
.small-box-work[
`$$\begin{aligned} &s_X = 30.169 \\[5pt]
&s_Y =  20.024 \\[5pt]
&r = 0.781 \end{aligned}$$`
]
]

--

.pull-more-right[
.small-box-work[
`$$\begin{aligned}\hat{\beta}_1 &= 0.781 \times \frac{20.024}{30.169} \\[10pt]
&= \mathbf{0.518}\end{aligned}$$`
]
]

---

## Estimating the intercept

.eq[

`$$\large{\hat{\beta}_0 = \bar{Y} - \hat{\beta}_1\bar{X}}$$`
]

--

.pull-slight-left[
.small-box-work[
`$$\begin{aligned}&\bar{x} = 60.850 \\[5pt]
&\bar{y} = 63.877 \\[5pt]
&\hat{\beta}_1 = 0.518 \end{aligned}$$`
]
]

--

.pull-more-right[
.small-box-work[
`$$\begin{aligned}\hat{\beta}_0 &= 63.877 - 0.518 \times 60.850 \\[10pt]
&= \mathbf{32.296}\end{aligned}$$`
]
]

---

## Interpreting slope & intercept

.eq[
`$$\hat{\text{audience}} = 32.296 + 0.518 \times \text{critics}$$`
]

<br>

--

.vocab[Slope]: For every one point increase in the critics score, we expect the audience score to increase by 0.518 points, on average.

--

.vocab[Intercept]: If the critics score is 0 points, we expect the audience score to be 32.296 points.

---

## Does it make sense to interpret the intercept?

--

✅ **Interpret the intercept if**
  - the predictor can feasibly take values equal to or near zero.
  - there are values near zero in the data.

--

<br> 
🛑 Otherwise, don't interpret the intercept!

---

## Recap

--

- Used simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable.

--

- Used the least squares method to estimate the slope and intercept.

--

- We interpreted the slope and intercept.
  - .vocab[Slope]: For every one unit increase in `\(x\)`, we expect y to change by `\(\hat{\beta}_1\)` units, on average. 
  - .vocab[Intercept]: If `\(x\)` is 0, then we expect `\(y\)` to be `\(\hat{\beta}_0\)` units.

Notes for current slide

Notes for next slide

Simple Linear RegressionIntroductionProf. Maria Tackett1

Click for PDF of slides

2

TopicsUse simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable.
3

Topics

Use simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable.
Estimate the slope and intercept of the regression line using the least squares method.

3

Topics

Use simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable.
Estimate the slope and intercept of the regression line using the least squares method.
Interpret the slope and intercept of the regression line.

3

Movie ratings data

The data set contains the "Tomatometer" score (critics) and audience score (audience) for 146 movies rated on rottentomatoes.com.

4

Movie ratings data

We want to fit a line to describe the relationship between the critics score and audience score.

5

Terminology

The response, Y, is the variable describing the outcome of interest.

The predictor, X, is the variable we use to help understand the variability in the response.

6

Regression model

A regression model is a function that describes the relationship between the response, , and the predictor, .

7

8

9

Simple linear regression

When we have a quantitative response, , and a single quantitative predictor, , we can use a simple linear regression model to describe the relationship between and .

10

11

12

How do we choose values for and ?

13

Residuals

14

Least squares line15

Least squares line

The residual for the observation is

15

Least squares line

The residual for the observation is

The sum of squared residuals is

15

Least squares line

The residual for the observation is

The sum of squared residuals is

The least squares line is the one that minimizes the sum of squared residuals

15

Estimating the slope

16

Estimating the slope

16

Estimating the slope

16

Estimating the intercept

17

Estimating the intercept

17

Estimating the intercept

17

Interpreting slope & intercept

18

Interpreting slope & intercept

Slope: For every one point increase in the critics score, we expect the audience score to increase by 0.518 points, on average.

18

Interpreting slope & intercept

Slope: For every one point increase in the critics score, we expect the audience score to increase by 0.518 points, on average.

Intercept: If the critics score is 0 points, we expect the audience score to be 32.296 points.

18

Does it make sense to interpret the intercept?19

Does it make sense to interpret the intercept?

✅ Interpret the intercept if

the predictor can feasibly take values equal to or near zero.
there are values near zero in the data.

19

Does it make sense to interpret the intercept?

✅ Interpret the intercept if

the predictor can feasibly take values equal to or near zero.
there are values near zero in the data.

🛑 Otherwise, don't interpret the intercept!

19

Recap20

RecapUsed simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable.
20

Recap

Used simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable.
Used the least squares method to estimate the slope and intercept.

20

Recap

Used simple linear regression to describe the relationship between a quantitative predictor and quantitative response variable.
Used the least squares method to estimate the slope and intercept.

We interpreted the slope and intercept.
- Slope: For every one unit increase in , we expect y to change by units, on average.
- Intercept: If is 0, then we expect to be units.

20

Click for PDF of slides

2

Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Esc	Back to slideshow