class: center, middle, inverse, title-slide # Multiple Linear Regression ## Types of Predictors ### Prof. Maria Tackett --- class: middle, center ## [Click here for PDF of slides](10-mlr-predictor-types.pdf) --- ## Topics - Mean-centering quantitative predictors - Using indicator variables for categorical predictors - Using interaction terms --- ## Peer-to-peer lender Today's data is a sample of 50 loans made through a peer-to-peer lending club. The data is in the `loan50` data frame in the openintro R package. <img src="10-mlr-predictor-types_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ## Variables **Predictors** - .vocab[`verified_income`]: Whether borrower's income source and amount have been verified (`Not Verified`, `Source Verified`, `Verified`) - .vocab[`debt_to_income`]: Debt-to-income ratio, i.e. the percentage of a borrower's total debt divided by their total income - .vocab[`annual_income`]: Annual income (in $1000s) **Response** - .vocab[`interest_rate`]: Interest rate for the loan --- ## Predictor variables <img src="10-mlr-predictor-types_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ## Response vs. Predictors <img src="10-mlr-predictor-types_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- ## Regression Model |term | estimate| std.error| statistic| p.value| conf.low| conf.high| |:------------------------------|--------:|---------:|---------:|-------:|--------:|---------:| |(Intercept) | 10.726| 1.507| 7.116| 0.000| 7.690| 13.762| |debt_to_income | 0.671| 0.676| 0.993| 0.326| -0.690| 2.033| |verified_incomeSource Verified | 2.211| 1.399| 1.581| 0.121| -0.606| 5.028| |verified_incomeVerified | 6.880| 1.801| 3.820| 0.000| 3.253| 10.508| |annual_income | -0.021| 0.011| -1.804| 0.078| -0.043| 0.002| -- .question[ - Describe the subset of borrowers who are expected to get an interest rate of 10.726% baesd on our model - Is this interpretation meaningful? Why or why not? ] --- class: middle, center ## Mean-centered variables --- ## Mean-Centered Variables If we are interested in interpreting the intercept, we can .vocab[mean-center] the quantitative predictors in the model. We can mean-center a quantitative predictor `\(X_j\)` using the following: `$$X_{j_{Cent}} = X_{j}- \bar{X}_{j}$$` -- If we mean-center all quantitative variables, then the intercept is interpreted as the expected value of the response variable when all quantitative variables are at their mean value. --- ## Loans data: mean-center variables <img src="10-mlr-predictor-types_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- ## Using mean-centere variables in the model .question[ How do you expect the model to change if we use the `debt_inc_cent` and `annual_income_cent` in the model? ] -- |term | estimate| std.error| statistic| p.value| conf.low| conf.high| |:------------------------------|--------:|---------:|---------:|-------:|--------:|---------:| |(Intercept) | 9.444| 0.977| 9.663| 0.000| 7.476| 11.413| |debt_inc_cent | 0.671| 0.676| 0.993| 0.326| -0.690| 2.033| |verified_incomeSource Verified | 2.211| 1.399| 1.581| 0.121| -0.606| 5.028| |verified_incomeVerified | 6.880| 1.801| 3.820| 0.000| 3.253| 10.508| |annual_income_cent | -0.021| 0.011| -1.804| 0.078| -0.043| 0.002| --- ## Original vs. mean-centered model .pull-left[ .small[ |term | estimate| |:------------------------------|--------:| |(Intercept) | 10.726| |debt_to_income | 0.671| |verified_incomeSource Verified | 2.211| |verified_incomeVerified | 6.880| |annual_income | -0.021| ] ] .pull-right[ .small[ |term | estimate| |:------------------------------|--------:| |(Intercept) | 9.444| |debt_inc_cent | 0.671| |verified_incomeSource Verified | 2.211| |verified_incomeVerified | 6.880| |annual_income_cent | -0.021| ] ] --- class: middle, center ## Indicator variables --- ## Indicator variables - Suppose there is a categorical variable with `\(K\)` categories (levels) - We can make `\(K\)` indicator variables - one indicator for each category - An .vocab[indicator variable] takes values 1 or 0 - 1 if the observation belongs to that category - 0 if the observation does not belong to that category --- ## Indicator variable for `verified_income` .small[ ```r loan50 <- loan50 %>% mutate(not_verified = if_else(verified_income == "Not Verified", 1, 0), source_verified = if_else(verified_income == "Source Verified", 1, 0), verified = if_else(verified_income == "Verified", 1, 0) ) ``` ] -- .small[ ``` ## # A tibble: 3 x 4 ## verified_income not_verified source_verified verified ## <fct> <dbl> <dbl> <dbl> ## 1 Not Verified 1 0 0 ## 2 Verified 0 0 1 ## 3 Source Verified 0 1 0 ``` ] --- ## Indicators in the model We will use `\(K-1\)` of the indicator variables in the model The .vocab[baseline] is the category that doesn't have a term in the model. The coefficients of the indicator variables in the model are interpreted as the expected change in the response compared to the baseline, holding all other variables constant. --- ## Interpreting `verified_income` .small[ <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> <th style="text-align:right;"> conf.low </th> <th style="text-align:right;"> conf.high </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 9.444 </td> <td style="text-align:right;"> 0.977 </td> <td style="text-align:right;"> 9.663 </td> <td style="text-align:right;"> 0.000 </td> <td style="text-align:right;"> 7.476 </td> <td style="text-align:right;"> 11.413 </td> </tr> <tr> <td style="text-align:left;"> debt_inc_cent </td> <td style="text-align:right;"> 0.671 </td> <td style="text-align:right;"> 0.676 </td> <td style="text-align:right;"> 0.993 </td> <td style="text-align:right;"> 0.326 </td> <td style="text-align:right;"> -0.690 </td> <td style="text-align:right;"> 2.033 </td> </tr> <tr> <td style="text-align:left;background-color: #dce5b2 !important;"> verified_incomeSource Verified </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 2.211 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 1.399 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 1.581 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 0.121 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> -0.606 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 5.028 </td> </tr> <tr> <td style="text-align:left;background-color: #dce5b2 !important;"> verified_incomeVerified </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 6.880 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 1.801 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 3.820 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 0.000 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 3.253 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 10.508 </td> </tr> <tr> <td style="text-align:left;"> annual_income_cent </td> <td style="text-align:right;"> -0.021 </td> <td style="text-align:right;"> 0.011 </td> <td style="text-align:right;"> -1.804 </td> <td style="text-align:right;"> 0.078 </td> <td style="text-align:right;"> -0.043 </td> <td style="text-align:right;"> 0.002 </td> </tr> </tbody> </table> ] -- .vocab[The baseline category is "Not verified".] --- ## Interpreting `verified_income` A person with source verified income is expected to take a loan with an interest rate that is 2.211% higher than the rate on loans to those whose income is not verified, holding all else constant. -- <br> A person with verified income is expected to take a loan with an interest rate that is 6.880% higher than the rate on loans to those whose income is not verified, holding all else constant. --- class: middle, center ## Interaction terms --- ## Interaction Terms Sometimes the relationship between a predictor variable and the response depends on the value of another predictor variable This is an .vocab[interaction effect] To account for this, we can include .vocab[interaction terms] in the model. --- ## Interest rate vs. annual income <img src="10-mlr-predictor-types_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> -- The lines are .vocab[not parallel] indicating there is an .vocab[interaction effect]. The slope of annual income differs based on the income verification. --- ## Interaction term in model <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 9.484 </td> <td style="text-align:right;"> 0.989 </td> <td style="text-align:right;"> 9.586 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> debt_inc_cent </td> <td style="text-align:right;"> 0.691 </td> <td style="text-align:right;"> 0.685 </td> <td style="text-align:right;"> 1.009 </td> <td style="text-align:right;"> 0.319 </td> </tr> <tr> <td style="text-align:left;"> annual_income_cent </td> <td style="text-align:right;"> -0.007 </td> <td style="text-align:right;"> 0.020 </td> <td style="text-align:right;"> -0.341 </td> <td style="text-align:right;"> 0.735 </td> </tr> <tr> <td style="text-align:left;"> verified_incomeSource Verified </td> <td style="text-align:right;"> 2.157 </td> <td style="text-align:right;"> 1.418 </td> <td style="text-align:right;"> 1.522 </td> <td style="text-align:right;"> 0.135 </td> </tr> <tr> <td style="text-align:left;"> verified_incomeVerified </td> <td style="text-align:right;"> 7.181 </td> <td style="text-align:right;"> 1.870 </td> <td style="text-align:right;"> 3.840 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;background-color: #dce5b2 !important;"> annual_income_cent:verified_incomeSource Verified </td> <td style="text-align:right;background-color: #dce5b2 !important;"> -0.016 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 0.026 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> -0.643 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 0.523 </td> </tr> <tr> <td style="text-align:left;background-color: #dce5b2 !important;"> annual_income_cent:verified_incomeVerified </td> <td style="text-align:right;background-color: #dce5b2 !important;"> -0.032 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 0.033 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> -0.979 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 0.333 </td> </tr> </tbody> </table> --- ## Interpreting interaction terms .vocab[What the interaction means: ] The effect of annual income on the interest rate differs by -0.016 when the income is source verified compared to when it is not verified, holding all else constant. -- <br> .vocab[Interpreting `annual_income` for source verified:] If the income is source verified, we expect the interest rate to decrease by 0.023% (-0.007 + -0.016) for each additional thousand dollars in annual income, holding all else constant. --- ## Recap - Mean-centering quantitative predictors - Using indicator variables for categorical predictors - Using interaction terms