+ - 0:00:00
Notes for current slide
Notes for next slide

Model comparison

Prof. Maria Tackett

1

Topics

  • ANOVA for Multiple Linear Regression

  • Nested F Test

  • R2 vs. Adj. R2

  • AIC & BIC

3

Restaurant tips

What affects the amount customers tip at a restaurant?

  • Response:

    • Tip: amount of the tip
  • Predictors:

    • Party: number of people in the party
    • Meal: time of day (Lunch, Dinner, Late Night)
    • Age: age category of person paying the bill (Yadult, Middle, SenCit)
4

Response Variable

5

Predictor Variables

6

Response vs. Predictors

7

Restaurant tips: model

term estimate std.error statistic p.value conf.low conf.high
(Intercept) 0.838 0.397 2.112 0.036 0.055 1.622
Party 1.837 0.124 14.758 0.000 1.591 2.083
AgeSenCit 0.379 0.410 0.925 0.356 -0.430 1.189
AgeYadult -1.009 0.408 -2.475 0.014 -1.813 -0.204


Is this the best model to explain variation in Tips?

8

ANOVA test for MLR

Using the ANOVA table, we can test whether any variable in the model is a significant predictor of the response. We conduct this test using the following hypotheses:

H0:β1=β2==βp=0Ha:at least one βj is not equal to 0

  • The statistic for this test is the F test statistic in the ANOVA table

  • We calculate the p-value using an F distribution with p and (np1) degrees of freedom

9

Tips: ANOVA Test

term df sumsq meansq statistic p.value
Party 1 1188.636 1188.636 285.712 0.000
Age 2 38.028 19.014 4.570 0.012
Residuals 165 686.444 4.160
10

Tips: ANOVA Test

term df sumsq meansq statistic p.value
Party 1 1188.636 1188.636 285.712 0.000
Age 2 38.028 19.014 4.570 0.012
Residuals 165 686.444 4.160

Model df: 3

Model SS: 1188.636 + 38.028 = 1226.664

Model MS: 1226.664/ 3 = 408.888

10

Tips: ANOVA Test

term df sumsq meansq statistic p.value
Party 1 1188.636 1188.636 285.712 0.000
Age 2 38.028 19.014 4.570 0.012
Residuals 165 686.444 4.160

Model df: 3

Model SS: 1188.636 + 38.028 = 1226.664

Model MS: 1226.664/ 3 = 408.888

FStat: 408.888 / 4.160 = 98.2903846

P-value: P(F > 98.2903846) 0

  • calculated using an F distribution with 3 and 165 degrees of freedom
10

Tips: ANOVA Test

term df sumsq meansq statistic p.value
Party 1 1188.636 1188.636 285.712 0.000
Age 2 38.028 19.014 4.570 0.012
Residuals 165 686.444 4.160

The data provide sufficient evidence to conclude that at least one coefficient is non-zero, i.e. at least one predictor in the model is significant.

11

Testing subset of coefficients

  • Sometimes we want to test whether a subset of coefficients are all equal to 0

  • This is often the case when we want test

    • whether a categorical variable with k levels is a significant predictor of the response
    • whether the interaction between a categorical and quantitative variable is significant
  • To do so, we will use the Nested (Partial) F Test

12

Nested (Partial) F Test

  • Suppose we have a full and reduced model:

Full:y=β0+β1x1++βqxq+βq+1xq+1+βpxpReduced:y=β0+β1x1++βqxq

13

Nested (Partial) F Test

  • Suppose we have a full and reduced model:

Full:y=β0+β1x1++βqxq+βq+1xq+1+βpxpReduced:y=β0+β1x1++βqxq

  • We want to test whether any of the variables xq+1,xq+2,,xp are significant predictors. To do so, we will test the hypothesis:

H0:βq+1=βq+2==βp=0Ha:at least one βj is not equal to 0

13

Nested F Test

  • The test statistic for this test is

F=(SSEreducedSSEfull)/# predictors testedSSEfull/(npfull1)


  • Calculate the p-value using the F distribution with df1 = # predictors tested and df2 = (npfull1)
14

Is Meal a significant predictor of tips?

term estimate
(Intercept) 1.254
Party 1.808
AgeSenCit 0.390
AgeYadult -0.505
MealLate Night -1.632
MealLunch -0.612
15

Tips: Nested F test

H0:βlatenight=βlunch=0Ha: at least one βj is not equal to 0

16

Tips: Nested F test

H0:βlatenight=βlunch=0Ha: at least one βj is not equal to 0

reduced <- lm(Tip ~ Party + Age, data = tips)
16

Tips: Nested F test

H0:βlatenight=βlunch=0Ha: at least one βj is not equal to 0

reduced <- lm(Tip ~ Party + Age, data = tips)
full <- lm(Tip ~ Party + Age + Meal, data = tips)
16

Tips: Nested F test

H0:βlatenight=βlunch=0Ha: at least one βj is not equal to 0

reduced <- lm(Tip ~ Party + Age, data = tips)
full <- lm(Tip ~ Party + Age + Meal, data = tips)
#Nested F test in R
anova(reduced, full)
16

Tips: Nested F test

Res.Df RSS Df Sum of Sq F Pr(>F)
165 686.444
163 622.979 2 63.465 8.303 0
17

Tips: Nested F test

Res.Df RSS Df Sum of Sq F Pr(>F)
165 686.444
163 622.979 2 63.465 8.303 0

F Stat: (686.444622.979)/2622.979/(16951)=8.303

17

Tips: Nested F test

Res.Df RSS Df Sum of Sq F Pr(>F)
165 686.444
163 622.979 2 63.465 8.303 0

F Stat: (686.444622.979)/2622.979/(16951)=8.303

P-value: P(F > 8.303) = 0.0003

  • calculated using an F distribution with 2 and 163 degrees of freedom
17

Tips: Nested F test

Res.Df RSS Df Sum of Sq F Pr(>F)
165 686.444
163 622.979 2 63.465 8.303 0

F Stat: (686.444622.979)/2622.979/(16951)=8.303

P-value: P(F > 8.303) = 0.0003

  • calculated using an F distribution with 2 and 163 degrees of freedom

The data provide sufficient evidence to conclude that at least one coefficient associated with Meal is not zero. Therefore, Meal is a significant predictor of Tips.

17

Model with Meal

term estimate std.error statistic p.value conf.low conf.high
(Intercept) 1.254 0.394 3.182 0.002 0.476 2.032
Party 1.808 0.121 14.909 0.000 1.568 2.047
AgeSenCit 0.390 0.394 0.990 0.324 -0.388 1.168
AgeYadult -0.505 0.412 -1.227 0.222 -1.319 0.308
MealLate Night -1.632 0.407 -4.013 0.000 -2.435 -0.829
MealLunch -0.612 0.402 -1.523 0.130 -1.405 0.181
18

Including interactions

Does the effect of Party differ based on the Meal time?

term estimate
(Intercept) 1.276
Party 1.795
AgeSenCit 0.401
AgeYadult -0.470
MealLate Night -1.845
MealLunch -0.461
Party:MealLate Night 0.111
Party:MealLunch -0.050
19

Nested F test for interactions

Let's use a Nested F test to determine if Party*Meal is statistically significant.

reduced <- lm(Tip ~ Party + Age + Meal, data = tips)
20

Nested F test for interactions

Let's use a Nested F test to determine if Party*Meal is statistically significant.

reduced <- lm(Tip ~ Party + Age + Meal, data = tips)
full <- lm(Tip ~ Party + Age + Meal + Meal * Party,
data = tips)
20

Nested F test for interactions

Let's use a Nested F test to determine if Party*Meal is statistically significant.

reduced <- lm(Tip ~ Party + Age + Meal, data = tips)
full <- lm(Tip ~ Party + Age + Meal + Meal * Party,
data = tips)
Res.Df RSS Df Sum of Sq F Pr(>F)
163 622.979
161 621.965 2 1.014 0.131 0.877
20

Final model for now

We conclude that the effect of Party does not differ based Meal. Therefore, we will use the original model that only included main effects.

term estimate std.error statistic p.value
(Intercept) 1.254 0.394 3.182 0.002
Party 1.808 0.121 14.909 0.000
AgeSenCit 0.390 0.394 0.990 0.324
AgeYadult -0.505 0.412 -1.227 0.222
MealLate Night -1.632 0.407 -4.013 0.000
MealLunch -0.612 0.402 -1.523 0.130
21

Model comparision

22

R2

Recall: R2 is the proportion of the variation in the response variable explained by the regression model

23

R2

Recall: R2 is the proportion of the variation in the response variable explained by the regression model

R2 will always increase as we add more variables to the model

  • If we add enough variables, we can always achieve R2=100%
23

R2

Recall: R2 is the proportion of the variation in the response variable explained by the regression model

R2 will always increase as we add more variables to the model

  • If we add enough variables, we can always achieve R2=100%

If we only use R2 to choose a best fit model, we will be prone to choose the model with the most predictor variables

23

Adjusted R2

Adjusted R2: measure that includes a penalty for unnecessary predictor variables

24

Adjusted R2

Adjusted R2: measure that includes a penalty for unnecessary predictor variables

Similar to R2, it is a measure of the amount of variation in the response that is explained by the regression model

24

Adjusted R2

Adjusted R2: measure that includes a penalty for unnecessary predictor variables

Similar to R2, it is a measure of the amount of variation in the response that is explained by the regression model

Differs from R2 by using the mean squares rather than sums of squares and therefore adjusting for the number of predictor variables

24

R2 and Adjusted R2

R2=SSModelSSTotal=1SSErrorSSTotal


25

R2 and Adjusted R2

R2=SSModelSSTotal=1SSErrorSSTotal


Adj.R2=1SSError/(np1)SSTotal/(n1)

25

Using R2 and Adj.R2

Adj.R2 can be used as a quick assessment to compare the fit of multiple models; however, it should not be the only assessment!

26

Using R2 and Adj.R2

Adj.R2 can be used as a quick assessment to compare the fit of multiple models; however, it should not be the only assessment!

Use R2 when describing the relationship between the response and predictor variables

26

Tips: Comparing models

Let's compare two models:

model1 <- lm(Tip ~ Party + Age + Meal, data = tips)
glance(model1) %>% select(r.squared, adj.r.squared)
## # A tibble: 1 x 2
## r.squared adj.r.squared
## <dbl> <dbl>
## 1 0.674 0.664
model2 <- lm(Tip ~ Party + Age + Meal + Day, data = tips)
glance(model2) %>% select(r.squared, adj.r.squared)
## # A tibble: 1 x 2
## r.squared adj.r.squared
## <dbl> <dbl>
## 1 0.683 0.662
27

AIC & BIC

Akaike's Information Criterion (AIC): AIC=nlog(SSError)nlog(n)+2(p+1)


Schwarz's Bayesian Information Criterion (BIC) BIC=nlog(SSError)nlog(n)+log(n)×(p+1)



See the supplemental note on AIC & BIC for derivations.

28

AIC & BIC

AIC=nlog(SSError)nlog(n)+2(p+1)BIC=nlog(SSError)nlog(n)+log(n)×(p+1)

29

AIC & BIC

AIC=nlog(SSError)nlog(n)+2(p+1)BIC=nlog(SSError)nlog(n)+log(n)×(p+1)


First Term: Decreases as p increases

29

AIC & BIC

AIC=nlog(SSError)nlog(n)+2(p+1)BIC=nlog(SSError)nlog(n)+log(n)×(p+1)


Second Term: Fixed for a given sample size n

30

AIC & BIC

AIC=nlog(SSError)nlog(n)+2(p+1)BIC=nlog(SSError)nlog(n)+log(n)×(p+1)


Third Term: Increases as p increases

31

Using AIC & BIC

AIC=nlog(SSError)nlog(n)+2(p+1)BIC=nlog(SSError)nlog(n)+log(n)×(p+1)



  • Choose model with the smaller value of AIC or BIC

  • If n8, the penalty for BIC is larger than that of AIC, so BIC tends to favor more parsimonious models (i.e. models with fewer terms)

32

Tips: AIC & BIC

model1 <- lm(Tip ~ Party + Age + Meal, data = tips)
glance(model1) %>% select(AIC, BIC)
## # A tibble: 1 x 2
## AIC BIC
## <dbl> <dbl>
## 1 714. 736.
model2 <- lm(Tip ~ Party + Age + Meal + Day, data = tips)
glance(model2) %>% select(AIC, BIC)
## # A tibble: 1 x 2
## AIC BIC
## <dbl> <dbl>
## 1 720. 757.
33

Recap

  • ANOVA for Multiple Linear Regression

  • Nested F Test

  • R2 vs. Adj. R2

  • AIC & BIC

34
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow