Lab 02: Simple linear regression

due Wed, Sep 2 at 11:59p

The primary goal of today’s lab is to practice with some of the primary data wrangling, data visualization, and modeling functions we will use for regression analysis. It is also another opportunity to continue practicing using RStudio and GitHub before you start collaborating with others.

Topics covered in this lab

Getting Started

Each of your assignments will begin with the steps outlined in this section. For more detailed instructions about getting started, see the Lab 01 instructions.

Clone the repo & start new RStudio project

Configure git

One more thing before you get started. We need to configure your git so that RStudio can communicate with GitHub. This requires two pieces of information: your GitHub username and the email address associated with your GitHub account.

Type the following lines of code in the console in RStudio filling in your name and email address.

library(usethis)
use_git_config(user.name = "GitHub username", 
               user.email="your email")

Now you’re ready to start! Open the R Markdown file (the one with extension .Rmd) to begin the assignment.

YAML

The top portion of your R Markdown file (between the three dashed lines) is called YAML. It stands for “YAML Ain’t Markup Language”. It is a human friendly data serialization standard for all programming languages. All you need to know is that this area is called the YAML (we will refer to it as such) and that it contains meta information about your document.

Before we introduce the data, let’s update the name and date in the YAML.

Open the R Markdown (Rmd) file in your project, input your name for the author name and today’s date for the date, and knit the document.

Committing and pushing changes:

Make sure you push all the files from the Git pane to your assignment repo on GitHub. The Git pane should be empty after you push. If it’s not, click the box next to the remaining files, write an informative commit message, and push.

Packages

We will use the following packages in today’s lab.

library(tidyverse)
library(knitr)
library(broom)
library(MASS) #package containing dataset

Data: Body and heart weight of cats

In today’s lab, we will analyze the cats dataset in the MASS package. This dataset contains the sex, body weight, and heart weight for 144 domestic cats.

When a veterinarian (vet for short) prescribes heart medicine for a cat, the dosage is partially determined by the weight of the cat’s heart. It would be almost impossible for an vet to weigh a cat’s heart in the moment, but it is possible (though still difficult!) to get a cat’s body weight.

We want to fit a regression model to understand the relationship between a cat’s heart weight and body weight. We could use the model to get an estimate of the cat’s heart weight.

Once you’ve loaded the MASS package, you can load the data using the code below:

data(cats)

The cats dataset contains the following variables:

Sex levels M and F
Bwt Body weight in kg
Hwt Heart weight in g

Exercises

Exploratory Data Analysis

  1. What is the response variable? What is the predictor variable?

  2. Plot the distribution of Hwt and calculate the appropriate summary statistics. Describe the distribution including the shape, center, spread, and any outliers.

  3. Plot the distribution of Bwt and calculate the appropriate summary statistics. Describe the distribution including the shape, center, spread, and any outliers.

  4. Create a scatterplot to display the relationship between Hwt and Bwt. Use the scatterplot to describe the relationship between the two variables. Be sure the scatterplot includes informative axis labels and title.

✅ ⬆️ Now is a good time to knit, commit and push your changes to GitHub with a short, informative commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Simple Linear Regression

  1. Use the lm function to fit a simple linear regression model for Bwt and Hwt. Complete the code below to assign your model a name, and use the tidy and kable functions to neatly display the model output.
_____ <- lm(_________)
tidy(_____) %>% # output model in a tidy format
  kable(digits = 3) # neatly format the output
  1. Interpret the slope in the context of the problem.

  2. Does it make sense to interpret the intercept? If so, interpret the intercept. Otherwise, briefly explain why not.

✅ ⬆️ Now is another good time to knit, commit and push your changes to GitHub with a short, informative commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Inference

  1. We would like to test the following hypotheses:

\[H_0: \beta_1 = 0 \text{ vs } H_a: \beta_1 \neq 0\]

  1. Now let’s compare the relationship between heart weight and body weight for male and female cats. Use the filter function to create one data frame for female cats and a separate data frame for male cats.

  2. Fit a model of Hwt vs. Bwtfor female cats. Display the 95% confidence interval for the slope using by including conf.int = TRUE in the tidy function.

    Interpret the interval in the context of the data.

  3. Fit a model of Hwt vs. Bwt for male cats. Display the 95% confidence interval for the slope using by including conf.int = TRUE in the tidy function.

  4. Does the data provide sufficient evidence that the slope is significantly different for male and female cats? Briefly explain. Hint: Compare the confidence intervals.

✅ ⬆️ Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Submission

Before you wrap up the assignment, make sure all documents are updated on your GitHub repo. we will be checking these to make sure you have been practicing how to commit and push changes.

Once your work is finalized in your GitHub repo, you will submit it to Gradescope. To submit your assignment: