AE 10: Price of diamonds

Announcements

HW 03 due Sep 23 at 11:59p (available after class)
Zoom waiting room for all class meetings going forward
Upcoming event:: ICPSR Data Fair: Data in Real Life
- Sep 21 - 25
- Free to attend but you need to register
- Can count towards a Stats Experience (more on these Monday!)

Questions from video?

Price of Diamonds

library(tidyverse)
library(broom)
library(patchwork)
library(knitr)

Today’s data set contains the price and characteristics for 271diamonds randomly selected from AwesomeGems.com in July 2005.¹ The variables in the data set are

Carat: Size of the diamond (in carats)
Color: Coded as D (most white/bright) through J
Clarity: Coded as IF (internally flawless), VVS1, VVS2, VS1, VS2, SI1, SI2, or SI3 (slightly clouded)
Depth: Depth (as a percentage of diameter)
PricePerCt: Price per carat
TotalPrice: Price for the diamond (in dollars)

We will use the characteristics to understand variability in the price of diamonds.

diamonds <- read_csv("data/diamonds.csv")

Part 1: Categorical predictors (12 min)

Model with single categorical predictor

Let’s fit a model using Clarity to predict the price.

model1 <- lm(TotalPrice ~ Clarity, data = diamonds)
tidy(model1) %>%
  select(term, estimate) %>%
  kable(digits = 3)

term	estimate
(Intercept)	3707.731
ClaritySI1	578.386
ClaritySI2	1608.849
ClaritySI3	-135.631
ClarityVS1	1147.464
ClarityVS2	851.568
ClarityVVS1	-456.664
ClarityVVS2	-464.124

What is the baseline level?
What is the interpretation of ClaritySI1?
What is the expected price of a diamond with ClarityVVS2?
What is the difference in the expected price between a diamond with ClaritySI3 and a diamond with ClarityVVS1?

Change baseline

We can change the baseline category using the fct_relevel function in the forcats R package. We will make SI3 the baseline category.

diamonds <- diamonds %>%
  mutate(Clarity = fct_relevel(Clarity, c("SI3", "IF", "VVS1", "VVS2", "VS1", "VS2", "SI1", "SI2"))
  )

Let’s refit the model:

model1_relevel <- lm(TotalPrice ~ Clarity, data = diamonds)
tidy(model1_relevel) %>%
  select(term, estimate) %>%
  kable(digits = 3)

term	estimate
(Intercept)	3572.100
ClarityIF	135.631
ClarityVVS1	-321.033
ClarityVVS2	-328.493
ClarityVS1	1283.095
ClarityVS2	987.199
ClaritySI1	714.017
ClaritySI2	1744.480

Interpret the coefficient for ClarityVVS1.
How does the coefficient for ClarityVVS1 compare to your response to Exercise 4 above? Is this what you expected?

Part 2: Interaction terms (10 min)

Now let’s fit a model using Clarity, Carat, and the interaction between the two variables.

model2 <- lm(TotalPrice ~ Clarity + Carat + Clarity * Carat, data = diamonds)
tidy(model2) %>%
  kable(digits = 3)

term	estimate	std.error	statistic	p.value
(Intercept)	-2268.000	1428.898	-1.587	0.114
ClarityIF	1181.771	1517.647	0.779	0.437
ClarityVVS1	499.216	1488.711	0.335	0.738
ClarityVVS2	-113.350	1500.621	-0.076	0.940
ClarityVS1	-904.187	1455.494	-0.621	0.535
ClarityVS2	-1426.577	1469.590	-0.971	0.333
ClaritySI1	-1392.974	1516.065	-0.919	0.359
ClaritySI2	-14.773	1695.575	-0.009	0.993
Carat	5562.000	1250.823	4.447	0.000
ClarityIF:Carat	2208.757	1457.253	1.516	0.131
ClarityVVS1:Carat	2293.206	1384.919	1.656	0.099
ClarityVVS2:Carat	3226.995	1422.740	2.268	0.024
ClarityVS1:Carat	4091.650	1290.517	3.171	0.002
ClarityVS2:Carat	4153.341	1309.744	3.171	0.002
ClaritySI1:Carat	3743.727	1375.395	2.722	0.007
ClaritySI2:Carat	1198.990	1474.424	0.813	0.417

Write the model equation for a diamond with ClaritySI3.
The coefficient of Carat is the relationship between carat and price for diamonds in what category of Clarity? (This is called a “main effect”.)
Interpret the coefficient of ClarityVVS1:Carat.
Write the model equation for a diamond with ClarityVVS1.
Describe the effect of carat on the price of a diamond with ClarityVVS1.

Part 3: Mean-center variables (10 min)

Mean-center Carat and refit the model from Part 2 using the mean-centered variable for carat.

#write code to mean-center

## code to refit model model

Describe the effect of carat on the total price for diamond with ClarityIF.
Interpret the intercept in the context of the data.

Data set adapted from the Diamonds data set in the Stat2Data R Package.↩︎