Lab 03: Airbnbs in Nashville, TN

due Wed, Sep 9 at 11:59p

Meet your team!

See STA 210 Teams to see your team assignment. This will be your team for labs and the final project.

Before you get started on the lab assignment, we will take a few minutes to help you develop a plan for working as a team.

✅ Come up with a team name. I encourage you to be creative! Your TA will get your team name by the end of lab.

✅ Identify something everyone on the team has in common that’s not necessarily in common with everyone else in the class.

✅ Fill out the team agreement. This will help you figure out a plan for working together during labs and outside of lab times. You can find the team agreement in the GitHub repo team-agreement-[github_team_name].

Lab 03

Clone assignment repo + start new project

A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.

Workflow: Using git and GitHub as a team

Assign each person on your team a number 1 through 4. For teams of three, Member 1 can take on the role of Member 4.

The following exercises must be done in order. Only one person should type in the .Rmd file and push updates at a time. When it is not your turn to type, you should still share ideas and contribute to the team’s discussion.

Update YAML

Team Member 1: Change the author to your team name and include each team member’s name in the author field of the YAML in the following format. Team Name: Member 1, Member 2, Member 3, Member 4.

Packages

We’ll use the following packages in this lab.

library(tidyverse)
library(knitr)
library(broom)
library(patchwork)

The data

For today’s lab, we will look at data on Airbnb listings in Nashville, TN. The data was obtained from http://insideairbnb.com/; it was originally scraped from airbnb.com. Visualizations of some of the of this data are available at http://insideairbnb.com/nashville/.

It is more important than ever that public spaces such as hotels and Airbnbs are exceptionally clean, so we will explore the cleanliness ratings for these Airbnb listings. More specifically we are interested in understanding the relationship between the cleanliness rating and the price of these Airbnb rentals in Nashville.

airbnb <- read_csv("data/nashville-airbnb.csv")

We will use the following variables in this lab:

Exercises

Team Member 1: Type the team’s responses to exercises 1 - 3.

Exploratory data analysis

  1. Since we’re focusing on the cleanliness of Airbnbs, we want to only include the rentals that charge a cleaning fee (hopefully they’re cleaner than ones with no cleaning fee!). We’d also like to consider only those rentals that in that are for an entire home / apartment. Create a new data frame called clean_airbnb that filters for

    • Airbnb rentals that have a cleaning fee Hint: Look at the way cleaning_fee is stored in the data set before filtering.
    • Airbnb rentals that are for an entire home / apartment

How many observations are in the new data frame?

You will use the filtered data frame for the next few exercises.

Type ?typeof in the console to learn more about the typeof function.

  1. We are interested in using the average cleanliness rating to understand variation in the price. How is price stored in the data set? Use the code below to see the data type of price using the typeof function.
typeof(clean_airbnb$price)

Why is the price stored as this data type?

See the tidyr reference page for more information about the extract function.

  1. Let’s update price so we can use it for regression analysis. To do so, we will first use the extract function in tidyr package to create a column of values with the numeric values. Then we can update price so it is a numeric data type. Fill in the code below to extract the values of the price and store them as a numeric variable.
clean_airbnb <- clean_airbnb %>% 
  extract(price, "price") %>% #extract price values
  mutate(price = ______) #change to numeric data type

✅ ⬆️ Team Member 1: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 1 - 3

Team Member 2: It’s your turn! Type the team’s response to exercises 4 - 6.

  1. Now, let’s examine the updated variable price. Make a plot to visualize the distribution of price. Based on the plot, how would you describe the distribution of price?

  2. We want to analyze Airbnbs that are most representative of those available in Nashville, so create a new data frame called clean_airbnb_300 that only includes rentals that are $300 or less per night. Make a plot to visualize the distribution of price in the updated data frame.

You will use clean_airbnb_300 for the remainder of the lab.

Find out more about density plots on the geom_density reference page. See the patchwork reference page for more information.

  1. Make a histogram and a density plot to visualize the distribution of the predictor variable review_scores_cleanliness. Use the patchwork function to display the plots side-by-side.
    • What are 1 - 2 ways a density plot is similar to a histogram?
    • What are 1 - 2 ways a density plot differs from a histogram?

✅ ⬆️ Team Member 2: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 4 - 6.

Team Member 3: It’s your turn! Type the team’s response to exercises 7 - 9.

Simple linear regression

  1. Fit a simple linear regression model to describe the relationship between the price and average cleanliness rating for Airbnbs in Nashville. Show the code and model output.

    • Would you stay at an Airbnb that is in the subset of rentals represented by the intercept? Briefly explain why or why not given the meaning of the intercept.
  2. Let’s use ANOVA to see if there is a linear relationship between the price and cleanliness rating for Airbnbs in Nashville. Make the ANOVA table for this model. Show the code and output.

  3. Calculate \(R^2\) and interpret it. Is this value surprising? Briefly explain why or why not.

✅ ⬆️ Team Member 3: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 7 - 9.

Team Member 4: It’s your turn! Type the team’s response to exercises 10 - 11.

  1. Explain what the F-statistic means in the context of the data.

  2. Based on the ANOVA test, is there a linear relationship between the average cleanliness rating and price of Airbnbs in Nashville? Briefly explain.

✅ ⬆️ Team Member 4: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the team’s completed lab!

Wrapping up

Go back through your write up to make sure you followed the coding style guidelines we discussed in class (e.g. no long lines of code)

Team Member 2: Make any edits as needed. Then knit, commit, and push the updated documents to GitHub if you made any changes.

All other team members can click to pull the finalized document.

Submission

Team Member 3: Upload the team’s PDF to Gradescope. Be sure to include every team member’s name in the Gradescope submission Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.

There should only be one submission per team on Gradescope.