AE 06: Price of Textbooks

Class Announcements

Lab 02 + HW 01 due today at 11:59p
Team labs start tomorrow
Quiz 01: Sep 10 - 11
- Week 01 - 03, Lab 01 - 02, HW 01, AE 01 - 06
- Open book, open note
- Will have 2 hours to take quiz once you open it
- May include computing + conceptual + analysis in practice
Lab 01 grading + solutions (will be available on Sakai)
Regrade requests
- On Gradescope 24 hours after assignment returned and due within a week
- Consult the solutions + attend office hours
- Only submit if something was incorrectly graded.
- We reserve the right to regrade the entire assignment which could result in a lower grade

Code styling + workflow

Avoid long lines of code.
- We should be able to see all of your code.
Knit, commit, and push regularly.
- Think about it like clicking to save regularly as you type a report.
Name code chunks to help navigate the Rmd.

Questions?

Clone a repo + start a new project

See Lab 01 for instructions on cloning a repo and starting a new project in RStudio.

Once you have the new project, run the code below (filling in your github username and email address) to configure git.

library(usethis)
use_git_config(user.name= "your github username", user.email="your email")

Price of Textbooks

library(tidyverse)
library(broom)
library(knitr)

In this AE, we will look at the price of textbooks and how it varies based on the number of pages. The data contains the price and number of pages for a random sample of 30 college textbooks from the Cal Poly-San Luis Obispo bookstore in Fall 2006.

textbooks <- read_csv("data/textbooks.csv")

We will use the following variables: - Pages: Number of pages in the textbook - Price: Price of the textbook in US dollars

Linear model

textbook_model <- lm(Price ~ Pages, data = textbooks)
tidy(textbook_model) %>%
  kable(digits = 3)

term	estimate	std.error	statistic	p.value
(Intercept)	-3.422	10.464	-0.327	0.746
Pages	0.147	0.019	7.653	0.000

Analysis of Variance (ANOVA)

We can calculate the ANOVA table in R using the following code:

anova(textbook_model) %>%
  kable(digits = 3)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
Pages	1	51877.03	51877.030	58.573	0
Residuals	28	24799.19	885.685	NA	NA

Use the ANOVA table 1. Calculate the total sum of squares ().

Calculate the total degrees of freedom.
What is , the regression standard error?
Calculate . Interpret this value.

Note: You can get model summaries in R using the glance function. Use the code below to get and . Check your responses exercises 3 and 4.

#Remove eval = F from the code chunk header 
glance(textbook_model)$sigma
glance(textbook_model)$r.squared

State the null and alternative hypotheses we can test using the ANOVA table.
What is the test statistic? How is it calculated?
What distribution was used to calculate the p-value?
State the conclusion from the test in the context of the data.

Knit your Rmd file to view the updated output. Commit your changes with an informative commit message, and push the updated files to GitHub.

The data used in this exercise is from Stat2: Building Models for a World of Data.