Announcements

Questions from video?

Price of Diamonds

library(tidyverse)
library(broom)
library(patchwork)
library(knitr)

Today’s data set contains the price and characteristics for 271diamonds randomly selected from AwesomeGems.com in July 2005.1 The variables in the data set are

We will use the characteristics to understand variability in the price of diamonds.

diamonds <- read_csv("data/diamonds.csv")

Part 1: Categorical predictors (12 min)

Model with single categorical predictor

Let’s fit a model using Clarity to predict the price.

model1 <- lm(TotalPrice ~ Clarity, data = diamonds)
tidy(model1) %>%
  select(term, estimate) %>%
  kable(digits = 3)
term estimate
(Intercept) 3707.731
ClaritySI1 578.386
ClaritySI2 1608.849
ClaritySI3 -135.631
ClarityVS1 1147.464
ClarityVS2 851.568
ClarityVVS1 -456.664
ClarityVVS2 -464.124
  1. What is the baseline level?

  2. What is the interpretation of ClaritySI1?

  3. What is the expected price of a diamond with ClarityVVS2?

  4. What is the difference in the expected price between a diamond with ClaritySI3 and a diamond with ClarityVVS1?

Change baseline

We can change the baseline category using the fct_relevel function in the forcats R package. We will make SI3 the baseline category.

diamonds <- diamonds %>%
  mutate(Clarity = fct_relevel(Clarity, c("SI3", "IF", "VVS1", "VVS2", "VS1", "VS2", "SI1", "SI2"))
  )

Let’s refit the model:

model1_relevel <- lm(TotalPrice ~ Clarity, data = diamonds)
tidy(model1_relevel) %>%
  select(term, estimate) %>%
  kable(digits = 3)
term estimate
(Intercept) 3572.100
ClarityIF 135.631
ClarityVVS1 -321.033
ClarityVVS2 -328.493
ClarityVS1 1283.095
ClarityVS2 987.199
ClaritySI1 714.017
ClaritySI2 1744.480
  1. Interpret the coefficient for ClarityVVS1.

  2. How does the coefficient for ClarityVVS1 compare to your response to Exercise 4 above? Is this what you expected?

Part 2: Interaction terms (10 min)

Now let’s fit a model using Clarity, Carat, and the interaction between the two variables.

model2 <- lm(TotalPrice ~ Clarity + Carat + Clarity * Carat, data = diamonds)
tidy(model2) %>%
  kable(digits = 3)
term estimate std.error statistic p.value
(Intercept) -2268.000 1428.898 -1.587 0.114
ClarityIF 1181.771 1517.647 0.779 0.437
ClarityVVS1 499.216 1488.711 0.335 0.738
ClarityVVS2 -113.350 1500.621 -0.076 0.940
ClarityVS1 -904.187 1455.494 -0.621 0.535
ClarityVS2 -1426.577 1469.590 -0.971 0.333
ClaritySI1 -1392.974 1516.065 -0.919 0.359
ClaritySI2 -14.773 1695.575 -0.009 0.993
Carat 5562.000 1250.823 4.447 0.000
ClarityIF:Carat 2208.757 1457.253 1.516 0.131
ClarityVVS1:Carat 2293.206 1384.919 1.656 0.099
ClarityVVS2:Carat 3226.995 1422.740 2.268 0.024
ClarityVS1:Carat 4091.650 1290.517 3.171 0.002
ClarityVS2:Carat 4153.341 1309.744 3.171 0.002
ClaritySI1:Carat 3743.727 1375.395 2.722 0.007
ClaritySI2:Carat 1198.990 1474.424 0.813 0.417
  1. Write the model equation for a diamond with ClaritySI3.

  2. The coefficient of Carat is the relationship between carat and price for diamonds in what category of Clarity? (This is called a “main effect”.)

  3. Interpret the coefficient of ClarityVVS1:Carat.

  4. Write the model equation for a diamond with ClarityVVS1.

  5. Describe the effect of carat on the price of a diamond with ClarityVVS1.

Part 3: Mean-center variables (10 min)

Mean-center Carat and refit the model from Part 2 using the mean-centered variable for carat.

#write code to mean-center
## code to refit model model
  1. Describe the effect of carat on the total price for diamond with ClarityIF.

  2. Interpret the intercept in the context of the data.


  1. Data set adapted from the Diamonds data set in the Stat2Data R Package.↩︎