Class Announcements

Code styling + workflow

Questions?

Clone a repo + start a new project

See Lab 01 for instructions on cloning a repo and starting a new project in RStudio.

Once you have the new project, run the code below (filling in your github username and email address) to configure git.

library(usethis)
use_git_config(user.name= "your github username", user.email="your email")

Price of Textbooks

library(tidyverse)
library(broom)
library(knitr)

In this AE, we will look at the price of textbooks and how it varies based on the number of pages. The data contains the price and number of pages for a random sample of 30 college textbooks from the Cal Poly-San Luis Obispo bookstore in Fall 2006.

textbooks <- read_csv("data/textbooks.csv")

We will use the following variables: - Pages: Number of pages in the textbook - Price: Price of the textbook in US dollars

Linear model

textbook_model <- lm(Price ~ Pages, data = textbooks)
tidy(textbook_model) %>%
  kable(digits = 3)
term estimate std.error statistic p.value
(Intercept) -3.422 10.464 -0.327 0.746
Pages 0.147 0.019 7.653 0.000

\[\hat{Price} = -3.422 + 0.147 \times Pages\]

Analysis of Variance (ANOVA)

We can calculate the ANOVA table in R using the following code:

anova(textbook_model) %>%
  kable(digits = 3)
Df Sum Sq Mean Sq F value Pr(>F)
Pages 1 51877.03 51877.030 58.573 0
Residuals 28 24799.19 885.685 NA NA

Use the ANOVA table 1. Calculate the total sum of squares (\(SS_{total}\)).

  1. Calculate the total degrees of freedom.

  2. What is \(\hat{\sigma}_\epsilon\), the regression standard error?

  3. Calculate \(R^2\). Interpret this value.

Note: You can get model summaries in R using the glance function. Use the code below to get \(\hat{\sigma}_\epsilon\) and \(R^2\). Check your responses exercises 3 and 4.

#Remove eval = F from the code chunk header 
glance(textbook_model)$sigma
glance(textbook_model)$r.squared
  1. State the null and alternative hypotheses we can test using the ANOVA table.

  2. What is the test statistic? How is it calculated?

  3. What distribution was used to calculate the p-value?

  4. State the conclusion from the test in the context of the data.

Knit your Rmd file to view the updated output. Commit your changes with an informative commit message, and push the updated files to GitHub.


The data used in this exercise is from Stat2: Building Models for a World of Data.