Duke Actuarial Society (DAS) is a student-run pre-professional organization that aims to empower Duke students interested in pursuing a career in actuarial science and risk management. DAS seeks to create and empower an actuarial community at Duke through student-led forums, annual speaker series, skill-building workshops, networking events and job-shadow days with outstanding industry professionals, participation in financial risk management case competitions and exam preparation. DAS is a community grounded in education, diversity, and mentorship that strives to guide the personal and professional development of Duke students and prepare them for the diverse career opportunities awaiting them at the crossroads of statistics, economics, mathematics, and computer science.
At this time, we encourage you to apply to Duke Actuarial Society. https://forms.gle/wEa2ZMcj618jupbq5 . Applications close September 5th at 11:59 pm EST.
We also encourage you to attend our speaker series and come meet some Duke alums, ranging from investment bankers to actuaries, who are not only excited to meet you and offer insight into their professions but also discuss the industry as it is today. Please see the flyer attached. If you have any questions/concerns, please contact bhrij.patel@duke.edu.
Rankings by Professor Tackett.
Background. Stephanie works as a data scientist at a small startup company that sells widgets over the Internet through an online store on the company’s web site. One day, the CEO comes by Stephanie’s desk and asks her how many customers have typically shown up at the store’s website each day for the past month. The CEO waits by Stephanie’s desk for the answer.
Analysis. Stephanie launches her statistical analysis software and, typing directly into the software’s console, immediately pulls records from the company’s database for the past month. She then groups the records by day and tabulates the number of customers. From this daily tabulation she then calculates the mean and the median count. She then quickly produces a time series plot of the daily count of visitors to the web site over the past month.
Results. Stephanie verbally reports the daily mean and median count to the CEO standing over her shoulder. While showing the results she briefly describes how the company’s database system collects information about visitors and its various strengths and weaknesses. She also notes that in the past month the web site experienced some unexpected down time when it was inaccessible to the world for a few hours.
See Lab 01 for instructions on cloning a repo and starting a new project in RStudio.
Once you have the new project, run the code below (filling in your github username and email address) to configure git.
library(usethis)
use_git_config(user.name= "your github username", user.email="your email")
library(tidyverse)
library(broom)
library(patchwork)
In this AE, we will look at the price of textbooks and how it varies based on the number of pages. The data contains the price and number of pages for a random sample of 30 college textbooks from the Cal Poly-San Luis Obispo bookstore in Fall 2006.
textbooks <- read_csv("data/textbooks.csv")
We will use the following variables: - Pages
: Number of pages in the textbook - Price
: Price of the textbook in US dollars
p1 <- ggplot(data = textbooks, aes(x = Price)) +
geom_histogram() +
labs(title = "Price of Textbooks",
subtitle = "in 2006")
p2 <- ggplot(data = textbooks, aes(x = Pages)) +
geom_histogram() +
labs(title = "Pages in Textbooks",
subtitle = "in 2006")
p3 <- ggplot(data = textbooks, aes(x = Pages, y = Price)) +
geom_point() +
labs(y = "Price in US Dollars",
title = "Price vs. Pages in Textbooks",
subtitle = "in 2006")
(p1 + p2) / p3
textbook_model <- lm(Price ~ Pages, data = textbooks)
tidy(textbook_model)
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -3.42 10.5 -0.327 0.746
## 2 Pages 0.147 0.0193 7.65 0.0000000245
\[\hat{Price} = -3.422 + 0.147 \times Pages\]
We use the residuals to check the model conditions for SLR. We can calculate the residuals and fitted (predicted) values using the augment
function.
textbook_aug <- augment(textbook_model)
resid_fitted <- ggplot(data = textbook_aug, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_hline(yintercept = 0, color = "red") +
labs(x = "Predicted values",
y = "Residual",
title = "Residuals vs. Predicted")
resid_hist <- ggplot(data = textbook_aug, aes(x = .resid)) +
geom_histogram() +
labs(x = "Residuals", title = "Dist. of Residuals")
resid_qq <- ggplot(data = textbook_aug, aes(sample = .resid)) +
stat_qq() +
stat_qq_line() +
labs(title = "Normal QQ-plot of residuals")
resid_fitted / (resid_hist + resid_qq)
Are the conditions satisfied? Briefly explain.
Below are two prediction tasks:
Which interval will we use to complete each tasks?
x0 <- data_frame(Pages = 500)
Interval A
textbook_model %>%
predict(x0, interval = "confidence", conf.level = 0.90)
## fit lwr upr
## 1 70.24192 59.02439 81.45944
Interval B
textbook_model %>%
predict(x0, interval = "prediction", conf.level = 0.90)
## fit lwr upr
## 1 70.24192 8.256892 132.2269
Knit your Rmd file to view the updated output. Commit your changes with an informative commit message, and push the updated files to GitHub.
The data used in this exercise is from Stat2: Building Models for a World of Data.