Lab 05: Stockpiling toilet paper

due Wed, Sep 30 at 11:59p

Model building is an iterative process that often requires fitting, assessing, and tweaking multiple models. The goal of today’s lab is to use an iterative model building process along with various criteria to compare models to select a model that “best” fits the data.

Getting Started

library(usethis)
use_git_config(user.name="github username", user.email="your email")

Password caching

If you would like your git password cached for a week for this project, type the following in the Terminal in RStudio:

git config --global credential.helper 'cache --timeout 604800'

You will need to enter your GitHub username and password one more time after caching the password. After that you won’t need to enter your credentials for 604800 seconds = 7 days.

Packages

We will use the following packages in today’s lab. Feel free to add any other packages as needed.

library(tidyverse)
library(broom)
library(knitr)

Data

The data from today’s lab come from the paper Influence of perceived threat of Covid-19 and HEXACO personality traits on toilet paper stockpiling by Garbe, Rau and Toppe. The objective their analysis was to examine the relationship between personality traits, perceived threat of Covid-19 and stockpiling toilet paper.

They conducted an online survey March 23 - 29, 2020 and used the results to fit multiple linear regression models to draw conclusions about their research questions. From their survey, they collected data on adults across 35 countries. Given the small number of responses from people outside of the United States, Canada and Europe, only responses from people in these three locations were included in the regression analysis.

They used the standardized version of each numerical variable in their model. Given a variable \(x\), the standardized \(x\) is calculated as \(\frac{x - \bar{x}}{s_x}\). In other words, we standardize a variable by shifting it by its mean and rescaling it by its standard deviation.

The dataset is in the file covid-tp.csv.

The variables we’ll use in this analysis are

Exercises

In this lab, we will use an iterative model building process to fit a model that can be used to predict Top_Inhouse, the number of toilet paper rolls a respondent reported having in their home at the time they took the survey.

  1. The started by fitting a baseline model with the variables politics, gender, age, household, duration, mobility, public_transport, europe, and date_diff_z. Briefly explain why the authors started by fitting this baseline model before including the variables measuring perceived threat from Covid-19 or personality traits.

  2. The authors used the standardized version of the numeric predictor variables in the model. Briefly explain why the authors used the standardized values for the numeric predictors rather than the raw variables. Hint: Consider how using the standardized variables can make it easier to draw conclusions from the models.

  3. Fit the baseline model that includes the variables politics, gender, age, household, duration, mobility, public_transport, europe, and date_diff_z.

    • Describe the subset of people included in the intercept.
    • Interpret the coefficient household in the context of the data. Note: The standard deviation for household is 1.57 \(\approx 2\).
  4. Does the perceived threat of Covid-19 help explain the variation in the toilet paper consumption after adjusting for the baseline variables? Show the code and output used to derive your answer.

  5. Does the effect of Covid-19 differ based on gender? Let’s a Nested F test to answer this question.

    • State the null and alternative hypotheses in words in the context of the data.
    • Show the code and output from the test.
    • State the conclusion in the context of the data.

Use the best fit model based on the results of the Nested F test for the next exercise.

  1. Calculate the model fit statistics - \(Adj. R^2\), \(AIC\), and \(BIC\) - for the model selected in the previous exercise.

  2. Now let’s consider the personality traits conscientiousness and openness to experiences, the two traits identified by the authors. Fit a model that includes all of the variables in the model chosen in Exercise 5 along with conscientiousness and openness to experiences.

In addition to the model output, display the model fit statistics - \(Adj. R^2\), \(AIC\), and \(BIC\).

  1. Now let’s compare the model chosen in Exercise 5 to the model fit in Exercise 7.

    • Based on Adjusted \(R^2\), which model better fits the data? Briefly explain.
    • Based on \(AIC\), which model better fits the data? Briefly explain.
    • Based on \(BIC\), which model better fits the data? Briefly explain.
  2. Briefly explain why we didn’t use \(R^2\) to choose the model of best fit in the previous question.

  3. Select a final model and briefly explain why you selected that model. Use the model to describe the factors that are associated with an increase in the number of toilet paper rolls.

Submission

Upload the team’s PDF to Gradescope. Be sure to include every team member’s name in the Gradescope submission Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.

There should only be one submission per team on Gradescope. Be sure to include every team member’s name in the Gradescope submission.


Click here to access the paper, copy of the survey, original data, and other supplemental materials.