ANOVA for Multiple Linear Regression
Nested F Test
R2 vs. Adj. R2
AIC & BIC
What affects the amount customers tip at a restaurant?
Response:
Tip
: amount of the tipPredictors:
Party
: number of people in the partyMeal
: time of day (Lunch, Dinner, Late Night) Age
: age category of person paying the bill (Yadult, Middle, SenCit)term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 0.838 | 0.397 | 2.112 | 0.036 | 0.055 | 1.622 |
Party | 1.837 | 0.124 | 14.758 | 0.000 | 1.591 | 2.083 |
AgeSenCit | 0.379 | 0.410 | 0.925 | 0.356 | -0.430 | 1.189 |
AgeYadult | -1.009 | 0.408 | -2.475 | 0.014 | -1.813 | -0.204 |
Is this the best model to explain variation in Tips?
Using the ANOVA table, we can test whether any variable in the model is a significant predictor of the response. We conduct this test using the following hypotheses:
H0:β1=β2=⋯=βp=0Ha:at least one βj is not equal to 0
The statistic for this test is the F test statistic in the ANOVA table
We calculate the p-value using an F distribution with p and (n−p−1) degrees of freedom
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
Party | 1 | 1188.636 | 1188.636 | 285.712 | 0.000 |
Age | 2 | 38.028 | 19.014 | 4.570 | 0.012 |
Residuals | 165 | 686.444 | 4.160 |
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
Party | 1 | 1188.636 | 1188.636 | 285.712 | 0.000 |
Age | 2 | 38.028 | 19.014 | 4.570 | 0.012 |
Residuals | 165 | 686.444 | 4.160 |
Model df: 3
Model SS: 1188.636 + 38.028 = 1226.664
Model MS: 1226.664/ 3 = 408.888
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
Party | 1 | 1188.636 | 1188.636 | 285.712 | 0.000 |
Age | 2 | 38.028 | 19.014 | 4.570 | 0.012 |
Residuals | 165 | 686.444 | 4.160 |
Model df: 3
Model SS: 1188.636 + 38.028 = 1226.664
Model MS: 1226.664/ 3 = 408.888
FStat: 408.888 / 4.160 = 98.2903846
P-value: P(F > 98.2903846) ≈0
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
Party | 1 | 1188.636 | 1188.636 | 285.712 | 0.000 |
Age | 2 | 38.028 | 19.014 | 4.570 | 0.012 |
Residuals | 165 | 686.444 | 4.160 |
The data provide sufficient evidence to conclude that at least one coefficient is non-zero, i.e. at least one predictor in the model is significant.
Sometimes we want to test whether a subset of coefficients are all equal to 0
This is often the case when we want test
To do so, we will use the Nested (Partial) F Test
Full:y=β0+β1x1+⋯+βqxq+βq+1xq+1+…βpxpReduced:y=β0+β1x1+⋯+βqxq
Full:y=β0+β1x1+⋯+βqxq+βq+1xq+1+…βpxpReduced:y=β0+β1x1+⋯+βqxq
H0:βq+1=βq+2=⋯=βp=0Ha:at least one βj is not equal to 0
F=(SSEreduced−SSEfull)/# predictors testedSSEfull/(n−pfull−1)
Meal
a significant predictor of tips?term | estimate |
---|---|
(Intercept) | 1.254 |
Party | 1.808 |
AgeSenCit | 0.390 |
AgeYadult | -0.505 |
MealLate Night | -1.632 |
MealLunch | -0.612 |
H0:βlatenight=βlunch=0Ha: at least one βj is not equal to 0
H0:βlatenight=βlunch=0Ha: at least one βj is not equal to 0
reduced <- lm(Tip ~ Party + Age, data = tips)
H0:βlatenight=βlunch=0Ha: at least one βj is not equal to 0
reduced <- lm(Tip ~ Party + Age, data = tips)
full <- lm(Tip ~ Party + Age + Meal, data = tips)
H0:βlatenight=βlunch=0Ha: at least one βj is not equal to 0
reduced <- lm(Tip ~ Party + Age, data = tips)
full <- lm(Tip ~ Party + Age + Meal, data = tips)
#Nested F test in Ranova(reduced, full)
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) |
---|---|---|---|---|---|
165 | 686.444 | ||||
163 | 622.979 | 2 | 63.465 | 8.303 | 0 |
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) |
---|---|---|---|---|---|
165 | 686.444 | ||||
163 | 622.979 | 2 | 63.465 | 8.303 | 0 |
F Stat: (686.444−622.979)/2622.979/(169−5−1)=8.303
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) |
---|---|---|---|---|---|
165 | 686.444 | ||||
163 | 622.979 | 2 | 63.465 | 8.303 | 0 |
F Stat: (686.444−622.979)/2622.979/(169−5−1)=8.303
P-value: P(F > 8.303) = 0.0003
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) |
---|---|---|---|---|---|
165 | 686.444 | ||||
163 | 622.979 | 2 | 63.465 | 8.303 | 0 |
F Stat: (686.444−622.979)/2622.979/(169−5−1)=8.303
P-value: P(F > 8.303) = 0.0003
The data provide sufficient evidence to conclude that at least one coefficient associated with Meal
is not zero. Therefore, Meal
is a significant predictor of Tips
.
Meal
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 1.254 | 0.394 | 3.182 | 0.002 | 0.476 | 2.032 |
Party | 1.808 | 0.121 | 14.909 | 0.000 | 1.568 | 2.047 |
AgeSenCit | 0.390 | 0.394 | 0.990 | 0.324 | -0.388 | 1.168 |
AgeYadult | -0.505 | 0.412 | -1.227 | 0.222 | -1.319 | 0.308 |
MealLate Night | -1.632 | 0.407 | -4.013 | 0.000 | -2.435 | -0.829 |
MealLunch | -0.612 | 0.402 | -1.523 | 0.130 | -1.405 | 0.181 |
Does the effect of Party
differ based on the Meal
time?
term | estimate |
---|---|
(Intercept) | 1.276 |
Party | 1.795 |
AgeSenCit | 0.401 |
AgeYadult | -0.470 |
MealLate Night | -1.845 |
MealLunch | -0.461 |
Party:MealLate Night | 0.111 |
Party:MealLunch | -0.050 |
Let's use a Nested F test to determine if Party*Meal
is statistically significant.
reduced <- lm(Tip ~ Party + Age + Meal, data = tips)
Let's use a Nested F test to determine if Party*Meal
is statistically significant.
reduced <- lm(Tip ~ Party + Age + Meal, data = tips)
full <- lm(Tip ~ Party + Age + Meal + Meal * Party, data = tips)
Let's use a Nested F test to determine if Party*Meal
is statistically significant.
reduced <- lm(Tip ~ Party + Age + Meal, data = tips)
full <- lm(Tip ~ Party + Age + Meal + Meal * Party, data = tips)
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) |
---|---|---|---|---|---|
163 | 622.979 | ||||
161 | 621.965 | 2 | 1.014 | 0.131 | 0.877 |
We conclude that the effect of Party
does not differ based Meal
. Therefore, we will use the original model that only included main effects.
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 1.254 | 0.394 | 3.182 | 0.002 |
Party | 1.808 | 0.121 | 14.909 | 0.000 |
AgeSenCit | 0.390 | 0.394 | 0.990 | 0.324 |
AgeYadult | -0.505 | 0.412 | -1.227 | 0.222 |
MealLate Night | -1.632 | 0.407 | -4.013 | 0.000 |
MealLunch | -0.612 | 0.402 | -1.523 | 0.130 |
Recall: R2 is the proportion of the variation in the response variable explained by the regression model
Recall: R2 is the proportion of the variation in the response variable explained by the regression model
R2 will always increase as we add more variables to the model
Recall: R2 is the proportion of the variation in the response variable explained by the regression model
R2 will always increase as we add more variables to the model
If we only use R2 to choose a best fit model, we will be prone to choose the model with the most predictor variables
Adjusted R2: measure that includes a penalty for unnecessary predictor variables
Adjusted R2: measure that includes a penalty for unnecessary predictor variables
Similar to R2, it is a measure of the amount of variation in the response that is explained by the regression model
Adjusted R2: measure that includes a penalty for unnecessary predictor variables
Similar to R2, it is a measure of the amount of variation in the response that is explained by the regression model
Differs from R2 by using the mean squares rather than sums of squares and therefore adjusting for the number of predictor variables
R2=SSModelSSTotal=1−SSErrorSSTotal
R2=SSModelSSTotal=1−SSErrorSSTotal
Adj.R2=1−SSError/(n−p−1)SSTotal/(n−1)
Adj.R2 can be used as a quick assessment to compare the fit of multiple models; however, it should not be the only assessment!
Adj.R2 can be used as a quick assessment to compare the fit of multiple models; however, it should not be the only assessment!
Use R2 when describing the relationship between the response and predictor variables
Let's compare two models:
model1 <- lm(Tip ~ Party + Age + Meal, data = tips)glance(model1) %>% select(r.squared, adj.r.squared)
## # A tibble: 1 x 2## r.squared adj.r.squared## <dbl> <dbl>## 1 0.674 0.664
model2 <- lm(Tip ~ Party + Age + Meal + Day, data = tips)glance(model2) %>% select(r.squared, adj.r.squared)
## # A tibble: 1 x 2## r.squared adj.r.squared## <dbl> <dbl>## 1 0.683 0.662
Akaike's Information Criterion (AIC): AIC=nlog(SSError)−nlog(n)+2(p+1)
Schwarz's Bayesian Information Criterion (BIC) BIC=nlog(SSError)−nlog(n)+log(n)×(p+1)
See the supplemental note on AIC & BIC for derivations.
AIC=nlog(SSError)−nlog(n)+2(p+1)BIC=nlog(SSError)−nlog(n)+log(n)×(p+1)
AIC=nlog(SSError)−nlog(n)+2(p+1)BIC=nlog(SSError)−nlog(n)+log(n)×(p+1)
First Term: Decreases as p increases
AIC=nlog(SSError)−nlog(n)+2(p+1)BIC=nlog(SSError)−nlog(n)+log(n)×(p+1)
Second Term: Fixed for a given sample size n
AIC=nlog(SSError)−nlog(n)+2(p+1)BIC=nlog(SSError)−nlog(n)+log(n)×(p+1)
Third Term: Increases as p increases
AIC=nlog(SSError)−nlog(n)+2(p+1)BIC=nlog(SSError)−nlog(n)+log(n)×(p+1)
Choose model with the smaller value of AIC or BIC
If n≥8, the penalty for BIC is larger than that of AIC, so BIC tends to favor more parsimonious models (i.e. models with fewer terms)
model1 <- lm(Tip ~ Party + Age + Meal, data = tips)glance(model1) %>% select(AIC, BIC)
## # A tibble: 1 x 2## AIC BIC## <dbl> <dbl>## 1 714. 736.
model2 <- lm(Tip ~ Party + Age + Meal + Day, data = tips)glance(model2) %>% select(AIC, BIC)
## # A tibble: 1 x 2## AIC BIC## <dbl> <dbl>## 1 720. 757.
ANOVA for Multiple Linear Regression
Nested F Test
R2 vs. Adj. R2
AIC & BIC
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |