Hang out with the TAs from STA 210! This is a casual conversation and a fun opportunity to meet the members of the STA 210 teaching team. The only rule is these can’t turn into office hours!
Tea with a TA counts as a statistics experience.
What affects the amount customers tip at a restaurant? To answer this question, we will use data collected in 2011 by a student at St. Olaf who worked at a local restaurant.1
The variables in the data set are
Tip
: amount of the tipParty
: number of people in the partyMeal
: time of day (Lunch, Dinner, Late Night)Age
: age category of person paying the bill (Yadult, Middle, SenCit)tips <- read_csv("data/tip-data.csv") %>%
filter(!is.na(Party))
We fit a model using the Party
, Meal
and Age
to predict the Tips
at this restaurant.
tips_model <- lm(Tip ~ Party + Meal + Age, data = tips)
tidy(tips_model) %>%
kable(digits = 3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 1.254 | 0.394 | 3.182 | 0.002 |
Party | 1.808 | 0.121 | 14.909 | 0.000 |
MealLate Night | -1.632 | 0.407 | -4.013 | 0.000 |
MealLunch | -0.612 | 0.402 | -1.523 | 0.130 |
AgeSenCit | 0.390 | 0.394 | 0.990 | 0.324 |
AgeYadult | -0.505 | 0.412 | -1.227 | 0.222 |
We can use the augment
function to get the model diagnostics:
tips_aug <- augment(tips_model) %>%
mutate(obs_num = row_number()) #add row number to help with graphing
leverage_threshold
.#calculate threshold
eval = F
from the code chunk header to create a plot of leverage vs. the observation number.Look at the values of the predictors for the observations that have high leverage. Why do you think these observations have high leverage?
ggplot(data = tips_aug, aes(x = obs_num, y = .hat)) +
geom_point(alpha = 0.7) +
geom_hline(yintercept = leverage_threshold, color = "red") +
labs(x = "Observation Number", y = "Leverage") +
geom_text(aes(label=ifelse(.hat > leverage_threshold,
as.character(obs_num), "")), nudge_x = 4)
.std.resid
) vs. the predicted values. Include markers on the plot to identify points with standardized residuals beyond \(\pm 2\) and points with standardized residuals beyond \(\pm 3\).# scatterplot of std. residuals vs. predicted
.cooksd
. Include markers on the plot to identify points with Cook’s D > 0.5 and Cook’s D > 1.## scatterplot of Cook's D vs. observation number
vif
function from the rms package to calculate the variance inflation factors for the variables in the model. Use the codeinstall.packages("rms")
if you need to install the rms package.
# load rms package and calculate VIF
Dahlquist, Samantha, and Jin Dong. 2011. “The Effects of Credit Cards on Tipping.” Project for Statistics 212-Statistics for the Sciences, St. Olaf College.↩︎