Introduction

The goal of today’s lab is fit and interpret multiple linear regression models. Before we get to that, however, we will look at something that may happen as you collaborate with your lab team (or any other team!) in GitHub.

Merge Conflicts (uh oh)

You may have seen this already through the course of your collaboration in Lab 03. When two collaborators make changes to a file and push the file to their repository, git merges these two files.

If these two files have conflicting content on the same line, git will produce a merge conflict. Merge conflicts need to be resolved manually, as they require a human intervention:

To resolve the merge conflict, decide if you want to keep only your text, the text on GitHub, or incorporate changes from both texts. Delete the conflict markers <<<<<<<, =======, >>>>>>> and make the changes you want in the final merge.

Assign numbers 1, 2, 3, and 4 to each of your team members (if only 3 team members, just number 1 through 3). Go through the following steps in detail, which simulate a merge conflict. Completing this exercise will be part of the lab grade.

Resolving a merge conflict

Step 1: Everyone clone the lab-04-candy assignment repo in RStudio and open file merge-conflict.Rmd. Don’t forget to configure git if you haven’t already done so:

library(usethis)
use_git_config(user.name="your github username", user.email="your email")

Member 4 should look at the group’s repo on GitHub.com to ensure that the other members’ files are pushed to GitHub after every step.

Step 2: Member 1 Change the team name to your team name. Knit, commit, and push.

Step 3: Member 2 Change the team name to something different (i.e., not your team name). Knit, commit, and push.

You should get an error.

Pull and review the document with the merge conflict. Read the error to your teammates. You can also show them the error by sharing your screen. A merge conflict occurred because you edited the same part of the document as Member 1. Resolve the conflict with whichever name you want to keep, then knit, commit and push again.

Step 4: Member 3 Write some narrative in the space provided. You should get an error.

This time, no merge conflicts should occur, since you edited a different part of the document from Members 1 and 2. Read the error to your teammates. You can also show them the error by sharing your screen.

Click to pull. Then, knit, commit, and push.

The data

The data from this lab comes from the the article FiveThirtyEight The Ultimate Halloween Candy Power Ranking by Walt Hickey. To collect data, Hickey and collaborators at FiveThirtyEight set up an experiment people could vote on a series of randomly generated candy matchups (e.g. Reeses vs. Skittles). Click here to check out some of the matchups.

The goal of this analysis is to fit a linear regression model to help determine what makes the best candy. The data set contains the characteristics and win percentage from 85 candies in the experiment. The data set includes the following variables:

Variable	Description
`chocolate`	Does it contain chocolate?
`fruity`	Is it fruit flavored?
`caramel`	Is there caramel in the candy?
`peanutalmondy`	Does it contain peanuts, peanut butter or almonds?
`nougat`	Does it contain nougat?
`crispedricewafer`	Does it contain crisped rice, wafers, or a cookie component?
`hard`	Is it a hard candy?
`bar`	Is it a candy bar?
`pluribus`	Is it one of many candies in a bag or box?
`sugarpercent`	The percentile of sugar it falls under within the data set.
`pricepercent`	The unit price percentile compared to the rest of the set.
`winpercent`	The overall win percentage according to 269,000 matchups.

Use the code below to load the data directly from the FiveThirtyEight’s github repo.

candy <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/candy-power-ranking/candy-data.csv")

Exercises

Let’s see how sugar and whether the candy is chocolate impact a candy’s win percentage. Create plots to visualize the relationship between each of these predictors and the response variable. Arrange the plots side-by-side using functions from patchwork or a similar package.

Hint: Recode chocolate to make it factor, so R correctly treats it as a categorical variable.

Describe the relationship between the response and each predictor variable based on the plots from the previous exercise.
Fit a main effects model (i.e. no interactions) using sugarpercent and chocolate to predict winpercent. Display the code and output for your model.
Based on the model from the previous exercise

Interpret the coefficient of sugarpercent in the context of the data.
What is the intercept in the model for candy that is not chocolate?
What is the intercept in the model for candy that is chocolate?

Let’s see if there’s a potential interaction between the sugar percentile and chocolate. Plot the relationship between winpercent and sugarpercent based on the chocolate, including lines to help more clearly see the relationships between the variables.

Based on this plot, do you think there’s an interaction effect between sugarpercent and chocolate? Briefly explain why or why not.

Fit the model including sugarpercent, chocolate, and their interaction. Display the code and model output.
Using the model above,

What is the intercept? The intercept is the expected win percentage for what type of candy?
What does the coefficient for the interaction between suagrpercent and chocolate tell us?
Describe the effect of sugarpercent on winpercent for candy that is chocolate.

Does the data provide sufficient evidence that the effect of sugarpercent on winpercent differs based on whether the candy is chocolate? Briefly explain why or why not.
Select a variable of your choice. (If you choose a categorical variable, be sure to convert it to a factor variable first.) What variable did you choose?

Plot the relationship between winpercent and your new variable.
Fit a regression model using suarpercent, chocolate, your variable, and an interaction between the variable and either sugarpercet or chocolate.
Use the main effect and interaction to describe how your variable affects the win percentage.

Based on the model from the previous exercise, how might you describe the “best” candy - the candy that wins as much as possible.

Submission

Upload the team’s PDF to Gradescope. Be sure to include every team member’s name in the Gradescope submission Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.

There should only be one submission per team on Gradescope. Be sure to include every team member’s name in the Gradescope submission.

Lab 04: Candy!