+ - 0:00:00
Notes for current slide
Notes for next slide

Movtivating regression

Prof. Maria Tackett

1

Sales vs. Advertising

  • Suppose you are a data scientist on the marketing team and the company wants to improve the sales of their premiere product

  • You want to understand the relationship between advertising budget and total sales

  • The goal is to advise the marketing team about how to set the advertising budget based on their target sales goals

2

Advertising vs. Sales

glimpse(advertising)
## Rows: 200
## Columns: 4
## $ tv <dbl> 230.1, 44.5, 17.2, 151.5, 180.8, 8.7, 57.5, 120.2, 8.6, 199…
## $ radio <dbl> 37.8, 39.3, 45.9, 41.3, 10.8, 48.9, 32.8, 19.6, 2.1, 2.6, 5…
## $ newspaper <dbl> 69.2, 45.1, 69.3, 58.5, 58.4, 75.0, 23.5, 11.6, 1.0, 21.2, …
## $ sales <dbl> 22.1, 10.4, 9.3, 18.5, 12.9, 7.2, 11.8, 13.2, 4.8, 10.6, 8.…
3

Advertising vs. Sales

glimpse(advertising)
## Rows: 200
## Columns: 4
## $ tv <dbl> 230.1, 44.5, 17.2, 151.5, 180.8, 8.7, 57.5, 120.2, 8.6, 199…
## $ radio <dbl> 37.8, 39.3, 45.9, 41.3, 10.8, 48.9, 32.8, 19.6, 2.1, 2.6, 5…
## $ newspaper <dbl> 69.2, 45.1, 69.3, 58.5, 58.4, 75.0, 23.5, 11.6, 1.0, 21.2, …
## $ sales <dbl> 22.1, 10.4, 9.3, 18.5, 12.9, 7.2, 11.8, 13.2, 4.8, 10.6, 8.…
  • Observations: 200 markets
3

Advertising vs. Sales

glimpse(advertising)
## Rows: 200
## Columns: 4
## $ tv <dbl> 230.1, 44.5, 17.2, 151.5, 180.8, 8.7, 57.5, 120.2, 8.6, 199…
## $ radio <dbl> 37.8, 39.3, 45.9, 41.3, 10.8, 48.9, 32.8, 19.6, 2.1, 2.6, 5…
## $ newspaper <dbl> 69.2, 45.1, 69.3, 58.5, 58.4, 75.0, 23.5, 11.6, 1.0, 21.2, …
## $ sales <dbl> 22.1, 10.4, 9.3, 18.5, 12.9, 7.2, 11.8, 13.2, 4.8, 10.6, 8.…
  • Observations: 200 markets

  • Variables:

    • tv: Spending on TV ads (in $thousands)
    • radio: Spending on radio ads (in $thousands)
    • newspaper: Spending on newspaper ads (in $thousands)
    • sales: total sales (in $millions)
3

Terminology

  • sales is the response variable
    • variable whose variation we want to understand / variable we wish to predict
    • also known as outcome or dependent variable
4

Terminology

  • sales is the response variable
    • variable whose variation we want to understand / variable we wish to predict
    • also known as outcome or dependent variable


  • tv, radio, newspaper are the predictor variables
    • variables used to account for variation in the outcome
    • also known as explanatory, independent, or input variables
4

Let's look at the data

5

Let's look at the data

Each line represents model we could use to predict sales using tv, radio, or newspaper

5

Let's look at the data

sales=f(tv,radio,newspaper)+ϵ

6

Model

sales=f(tv,radio,newspaper)+ϵ

  • Goal: Define f
7

Model

sales=f(tv,radio,newspaper)+ϵ

  • Goal: Define f

  • How do we define f?

    • Make an assumption about the functional form f
    • Use the data to fit a model based on that form
7

How to define f

In general,

  1. Choose the functional form of f, i.e. choose the appropriate model given the data
    • Ex: f is a linear model f(X)=β0+β1X1++βpXp
8

How to define f

In general,

  1. Choose the functional form of f, i.e. choose the appropriate model given the data
    • Ex: f is a linear model f(X)=β0+β1X1++βpXp
  1. Use the data to fit (or train) the model, i.e. estimate the model parameters
    • Ex: Find estimates of β0,β1,,βp
8

Why?

^sales=ˆβ0+ˆβ1×tv+ˆβ2×radio+ˆβ3×newspaper

9

Why?

^sales=ˆβ0+ˆβ1×tv+ˆβ2×radio+ˆβ3×newspaper

Prediction:

What do we expect sales to be in a market where there is $100,000 spent on TV ads, $30,000 spent on radio ads, and $10,000 spent on newspaper ads?

9

Why?

^sales=ˆβ0+ˆβ1×tv+ˆβ2×radio+ˆβ3×newspaper

Prediction:

What do we expect sales to be in a market where there is $100,000 spent on TV ads, $30,000 spent on radio ads, and $10,000 spent on newspaper ads?

Inference:

What is the relationship between spending on TV ads and sales after accounting for spending on radio and newspaper ads?

9

Course Outline

Unit 1: Quantitative Response Variables

  • Simple Linear Regression
  • Multiple Linear Regression


Unit 3: Looking Ahead

  • Weighted least squares
  • Dealing with missing data
  • Modeling in practice
  • Unit 2: Categorical Response Variable
    • Logistic Regression
    • Multinomial Logistic Regression
10

Sales vs. Advertising

  • Suppose you are a data scientist on the marketing team and the company wants to improve the sales of their premiere product

  • You want to understand the relationship between advertising budget and total sales

  • The goal is to advise the marketing team about how to set the advertising budget based on their target sales goals

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow