Sales vs. Advertising

Data and packages

We start with loading the packages we’ll use.

library(readr)
library(tidyverse)
library(skimr)
library(broom)

advertising <- read_csv("data/advertising.csv")

We will analyze the advertising and sales data for 200 markets. The variables we’ll use are

tv: total spending on TV advertising (in $thousands)
radio: total spending on radio advertising (in $thousands)
newspaper: total spending on newspaper advertising (in $thousands)
sales: total sales (in $millions)

Analysis

We’ll begin the analysis by getting quick view of the data:

glimpse(advertising)

## Rows: 200
## Columns: 4
## $ tv        <dbl> 230.1, 44.5, 17.2, 151.5, 180.8, 8.7, 57.5, 120.2, 8.6, 199…
## $ radio     <dbl> 37.8, 39.3, 45.9, 41.3, 10.8, 48.9, 32.8, 19.6, 2.1, 2.6, 5…
## $ newspaper <dbl> 69.2, 45.1, 69.3, 58.5, 58.4, 75.0, 23.5, 11.6, 1.0, 21.2, …
## $ sales     <dbl> 22.1, 10.4, 9.3, 18.5, 12.9, 7.2, 11.8, 13.2, 4.8, 10.6, 8.…

Next, we can calculate summary statistics for each of the variables in the data set.

# skim() is from the skimr package
advertising %>% 
  skim()

Data summary
Name	Piped data
Number of rows	200
Number of columns	4
_______________________
Column type frequency:
numeric	4
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
tv	1	147.04	85.85	0.7	74.38	149.75	218.82	296.4	▇▆▆▇▆
radio	1	23.26	14.85	0.0	9.97	22.90	36.52	49.6	▇▆▆▆▆
newspaper	1	30.55	21.78	0.3	12.75	25.75	45.10	114.0	▇▆▃▁▁
sales	1	14.02	5.22	1.6	10.38	12.90	17.40	27.0	▁▇▇▅▂

What type of advertising typically has the smallest spending?
What type of advertising has the largest variation in spending?
Describe the shape of the distribution of sales.

We are most interested in understanding how advertising spending affect sales. One way to quantify the relationship between the variables is by calculating the correlation matrix.

advertising %>% 
  cor()

##                   tv      radio  newspaper     sales
## tv        1.00000000 0.05480866 0.05664787 0.7822244
## radio     0.05480866 1.00000000 0.35410375 0.5762226
## newspaper 0.05664787 0.35410375 1.00000000 0.2282990
## sales     0.78222442 0.57622257 0.22829903 1.0000000

What is the correlation between radio and sales? Interpret this value.
What type of advertising has the strongest linear relationship with sales?

Below are visualizations of sales versus each explanatory variable.

ggplot(data = advertising, mapping = aes(x =tv,y = sales)) + 
  geom_point(alpha=0.7) +
  geom_smooth(method="lm",se=FALSE,color="blue") + 
  labs(title = "Sales vs. TV Advertising", 
       x= "TV Advertising (in $thousands)", 
       y="Sales (in $millions") #fill in the Y axis label

ggplot(data = advertising, mapping = aes(x = radio, y = sales)) + 
  geom_point(alpha = 0.7) + 
  geom_smooth(method = "lm",se=FALSE,color="red") +
  labs(title = "Sales vs. TV Advertising", 
       x= "Radio Advertising (in $thousands)", 
       y="Sales (in $millions)")

## Fill in the code to create the a scatterplot sales vs. TV ads.

Since tv appears to have the strongest linear relationship with sales, let’s calculate a simple linear regression model using these two variables.

ad_model <- lm(sales ~ tv, data=advertising)
ad_model

## 
## Call:
## lm(formula = sales ~ tv, data = advertising)
## 
## Coefficients:
## (Intercept)           tv  
##     7.03259      0.04754

Write the model equation.
Interpret the intercept in the context of the problem.
Interpret the slope in the context of the problem.

We’ll talk about slope and intercept next week!

Sales vs. Advertising

08.19.20

Data and packages

Analysis

Acknowledgements