Getting started

Each of your assignments will begin with the steps outlined in this section. For more detailed instructions about getting started, see the Lab 01 instructions

Go to the sta210-fa20 organization on GitHub. Click on the repo with the prefix hw-01-slr. It contains the starter documents you need to complete the lab.
Click on the green Code button, select Use HTTPS. Click on the clipboard icon to copy the repo URL.
Go to https://vm-manage.oit.duke.edu/containers and login with your Duke NetId and Password.
Click to log into the Docker container STA 210 - Regression Analysis. You should now see the RStudio environment.
Go to File ➡️ New Project ➡️ Version Control ➡️ Git.
Copy and paste the URL of your assignment repo into the dialog box Repository URL.
Click Create Project, and the files from your GitHub repo will be displayed the Files pane in RStudio.
Type the following lines of code in the console in RStudio filling in your name and email address.

library(usethis)
use_git_config(user.name = "GitHub username", 
               user.email="your email")

Tips

Here are some tips as you complete HW 01:

Periodically knit your document and commit changes (the more often the better, for example, once per each new task).
Be sure to include all relevant code and resulting output.
Push all your changes back to your GitHub repo. The Git pane in RStudio should be empty after you push. You can also check your assignment repo on github.com to make sure it has updated.
Once you have completed your work, you will submit your assignment in Gradescope. You are welcome to resubmit your work until the assignment deadline. We will grade the most recent version of the .pdf file in Gradescope.

Packages

We will use the following packages in this assignment:

library(tidyverse)
library(broom)
library(knitr)

Questions

Part 1: Conceptual questions

Our data set contains the nutrition information for food items sold at Starbucks. Starbucks will often display the number of calories in store displays but will not show any other nutrition information. Therefore, we want to use the number of calories to understand variability in the amount of carbohydrates (in grams) in Starbucks food items.

Use the code below to load the dataset. It is originally from the openintro R package.

starbucks <- read_csv("data/starbucks.csv")

We will use the variables calories and carb for this analysis. Which variable is the response?
Create a scatterplot displaying the relationship between the two variables. Include appropriate a title and appropriate axis labels.
Describe the relationship between number of calories and amount of carbohydrates in Starbucks food items.
Use the lm function to fit a line to describe the relationship between the number of calories and amount of carbohydrates.
- Show the code and model output.
- Write the model equation.
Interpret the slope in the context of the data.
Does it make sense to interpret the intercept? If so, write the interpretation in the context of the data. Otherwise, briefly explain why not.
Calculate the 95% confidence interval for the slope, . Show the steps of the calculation, e.g. don’t just print the confidence interval in the lm output.
Interpret the confidence interval from the previous question in the context of the data.
Based on this analysis, is the number of calories a useful predictor of the amount of carbohydrates in food items at Starbucks? Briefly explain why or why not.

Part 2: Analysis in Practice

Read the 2016 FiveThirtyEight article “New Yorkers Will Pay $56 A Month To Trim A Minute Off Their Commute” by Carl Bialik.

Rank each of the following principles from Elements and Principles for Characterizing Variation between Data Analyses for the FiveThirtyEight article. Use a scale of 1 - 5, where 1 means the principle has not been applied and 5 means the principle has been fully applied. Include a 1 - 2 sentence summary for each principle.

data matching
exhaustive
skeptical
second order
transparency
reproducible

Submission

Knit, commit, and push your final changes to GitHub.
Go to http://www.gradescope.com and click Log in in the top right corner.
Click School Credentials ➡️ Duke NetID and log in using your NetID credentials.
Click on your STA 210 course.
Click on the assignment, and you’ll be prompted to submit it.
Mark the pages associated with each exercise. Each page of the homework should be associated with at least one question (i.e., should be “checked”).
Select the first page of your .pdf submission to be associated with the “Overall” section.

Grading

Total	50
Part 1: Conceptual Problems	28
Part 2: Data Analysis	12
Document submitted as PDF with clear question headers	3
Name and date updated in YAML	2
Narrative written in complete sentences	3
At least 3 informative commit messages	2

Exercise in part 2 is adapted frm question by Dr. Lucy McGowan.

HW 01: Simple linear regression

due Wed, Sep 2 at 11:59p