Clone the hw-02- repo and start a new project in RStudio. For more detailed instructions about getting started, see the Lab 01 instructions
Type the following lines of code in the console in RStudio filling in your Github username and the email address associated with your Github account.
library(usethis)
use_git_config(user.name = "your github username",
user.email="your email")
Here are some tips as you complete HW 02:
We will use the following packages in this assignment:
library(tidyverse)
library(broom)
library(knitr)
#add other packages as needed
We will go back to the data set from HW 01 that contains the nutrition information for food items at Starbucks.
Starbucks will often display the number of calories in store displays but will not show any other nutrition information. Therefore, we want to use the number of calories to understand variability in the amount of carbohydrates (in grams) in Starbucks food items.
Use the code below to load the dataset. It is originally from the openintro R package.
starbucks <- read_csv("data/starbucks.csv")
We fit a linear model to describe the relationship between the amount of carbohydrates (carb
) and the number of calories (calories
) in Starbucks food items. Use the code below to fit and display the model.
carb_model <- lm(carb ~ calories, data = starbucks)
tidy(carb_model) %>%
kable(digits = 3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 8.944 | 4.746 | 1.884 | 0.063 |
calories | 0.106 | 0.013 | 7.923 | 0.000 |
According to the Starbucks menu, pumpkin bread has 410 calories.
a. How many grams of carbohydrates do you predict to be in pumpkin bread? Show how you calculated this answer.
b. Show code and output to obtain a 90% confidence interval for the mean grams of carbohydrates in Starbucks food items with 410 calories.
c. Show code and output to obtain a 90% prediction interval for the grams of carbohydrates in food items with 410 calories.
d. You decide to purchase a piece of pumpkin bread, and you want to predict the amount of carbs in this piece. Should you use the interval from part b or part c? Briefly explain your choice.
Let’s take a look at the model conditions. First, create a scatterplot of the residuals vs. predicted values. Show the code used to create the plot. Be sure your plot has informative axis labels and title. Hint: You can use the augment
function to calculate the residuals and predicted values for each observation.
Make a histogram and Normal quantile plot of the residuals. Show the code used to create each plot. Be sure your plots have informative axis labels and titles.
Are the model conditions satisfied? Briefly explain your reasoning for each condition.
What percent of the variation in carbohydrates is explained by the calories in Starbucks food items? Show how you calculated this value from the ANOVA table.
Write your responses to the following questions:
What is one question you still have about simple linear regression or modeling in general?
What is one thing you learned about simple linear regression?
You must write a substantive response to receive credit, i.e. responses equivalent to “I don’t have any questions” are not considered substantive and will not receive credit.
Knit, commit, and push your final changes to GitHub, then submit your finalized PDF on Gradescope. See the Lab 01 instructions for more details on submitting your work on Gradescope.
Total | 50 |
---|---|
Part 1: Conceptual questions | 34 |
Part 2: Wrapping up SLR | 6 |
Document submitted as PDF with clear question headers | 3 |
Name and date updated in YAML | 2 |
Narrative written in complete sentences | 3 |
At least 3 informative commit messages | 2 |