class: center, middle, inverse, title-slide # Movtivating regression ### Prof. Maria Tackett --- ## Sales vs. Advertising - Suppose you are a data scientist on the marketing team and the company wants to improve the sales of their premiere product - You want to understand the relationship between .vocab[advertising budget] and .vocab[total sales] - The goal is to advise the marketing team about how to set the advertising budget based on their target sales goals --- ## Advertising vs. Sales ```r glimpse(advertising) ``` ``` ## Rows: 200 ## Columns: 4 ## $ tv <dbl> 230.1, 44.5, 17.2, 151.5, 180.8, 8.7, 57.5, 120.2, 8.6, 199… ## $ radio <dbl> 37.8, 39.3, 45.9, 41.3, 10.8, 48.9, 32.8, 19.6, 2.1, 2.6, 5… ## $ newspaper <dbl> 69.2, 45.1, 69.3, 58.5, 58.4, 75.0, 23.5, 11.6, 1.0, 21.2, … ## $ sales <dbl> 22.1, 10.4, 9.3, 18.5, 12.9, 7.2, 11.8, 13.2, 4.8, 10.6, 8.… ``` -- - .vocab[Observations]: 200 markets -- - .vocab[Variables]: - `tv`: Spending on TV ads (in $thousands) - `radio`: Spending on radio ads (in $thousands) - `newspaper`: Spending on newspaper ads (in $thousands) - `sales`: total sales (in $millions) --- ## Terminology - `sales` is the .vocab[response variable] - variable whose variation we want to understand / variable we wish to predict - also known as *outcome* or *dependent* variable -- <br> - `tv`, `radio`, `newspaper` are the .vocab[predictor variables] - variables used to account for variation in the outcome - also known as *explanatory*, *independent*, or *input* variables --- ## Let's look at the data <img src="02-motivate-regression_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> -- Each line represents model we could use to predict `sales` using `tv`, `radio`, or `newspaper` --- ## Let's look at the data <img src="02-motivate-regression_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> .alert[ `$$\text{sales} = f(\text{tv}, \text{radio}, \text{newspaper}) + \epsilon$$` ] --- ## Model .alert[ `$$\text{sales} = f(\text{tv}, \text{radio}, \text{newspaper}) + \epsilon$$` ] - **Goal**: Define `\(f\)` -- - How do we define `\(f\)`? - Make an assumption about the functional form `\(f\)` - Use the data to fit a model based on that form --- ## How to define `\(f\)` In general, 1. Choose the functional form of `\(f\)`, i.e. <font class="vocab">choose the appropriate model given the data</font> - Ex: `\(f\)` is a linear model `$$f(\mathbf{X}) = \beta_0 + \beta_1 X_1 + \dots + \beta_p X_p$$` -- 2. Use the data to fit (or train) the model, i.e. <font class="vocab">estimate the model parameters</font> - Ex: Find estimates of `\(\beta_0, \beta_1, \ldots, \beta_p\)` --- ## Why? .alert[ `$$\hat{\text{sales}} = \hat{\beta}_0 + \hat{\beta}_1 \times \text{tv} + \hat{\beta}_2 \times \text{radio} + \hat{\beta}_3 \times \text{newspaper}$$` ] -- .vocab[Prediction:] What do we expect `sales` to be in a market where there is $100,000 spent on TV ads, $30,000 spent on radio ads, and $10,000 spent on newspaper ads? -- .vocab[Inference:] What is the relationship between spending on TV ads and sales after accounting for spending on radio and newspaper ads? --- ## Course Outline .pull-left[ .vocab[Unit 1: Quantitative Response Variables] - Simple Linear Regression - Multiple Linear Regression <br> .vocab[Unit 3: Looking Ahead] - Weighted least squares - Dealing with missing data - Modeling in practice ] .pull-right[ - .vocab[Unit 2: Categorical Response Variable] - Logistic Regression - Multinomial Logistic Regression ]