See STA 210 Teams to see your team assignment. This will be your team for labs and the final project.
Before you get started on the lab assignment, we will take a few minutes to help you develop a plan for working as a team.
✅ Come up with a team name. I encourage you to be creative! Your TA will get your team name by the end of lab.
✅ Identify something everyone on the team has in common that’s not necessarily in common with everyone else in the class.
✅ Fill out the team agreement. This will help you figure out a plan for working together during labs and outside of lab times. You can find the team agreement in the GitHub repo team-agreement-[github_team_name].
A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.
Go to course organization on GitHub.
In addition to your private individual repositories, you should now see a repo named lab-03-airbnb-[github_team_name]. Go to that repository.
Each person on the team should clone the repository and open a new project in RStudio. Do not make any changes to the .Rmd file until the instructions tell you do to so.
Assign each person on your team a number 1 through 4. For teams of three, Member 1 can take on the role of Member 4.
The following exercises must be done in order. Only one person should type in the .Rmd file and push updates at a time. When it is not your turn to type, you should still share ideas and contribute to the team’s discussion.
Team Member 1: Change the author to your team name and include each team member’s name in the author
field of the YAML in the following format. Team Name: Member 1, Member 2, Member 3, Member 4
.
We’ll use the following packages in this lab.
For today’s lab, we will look at data on Airbnb listings in Nashville, TN. The data was obtained from http://insideairbnb.com/; it was originally scraped from airbnb.com. Visualizations of some of the of this data are available at http://insideairbnb.com/nashville/.
It is more important than ever that public spaces such as hotels and Airbnbs are exceptionally clean, so we will explore the cleanliness ratings for these Airbnb listings. More specifically we are interested in understanding the relationship between the cleanliness rating and the price of these Airbnb rentals in Nashville.
We will use the following variables in this lab:
price
: Cost per night (in U.S. dollars)cleaning_fee
: Cleaning fee (in U.S. dollars)room_type
:
review_scores_cleanliness
: Average cleanliness score (0 - 10)Team Member 1: Type the team’s responses to exercises 1 - 3.
Since we’re focusing on the cleanliness of Airbnbs, we want to only include the rentals that charge a cleaning fee (hopefully they’re cleaner than ones with no cleaning fee!). We’d also like to consider only those rentals that in that are for an entire home / apartment. Create a new data frame called clean_airbnb
that filters for
cleaning_fee
is stored in the data set before filtering.How many observations are in the new data frame?
You will use the filtered data frame for the next few exercises.
Type ?typeof
in the console to learn more about the typeof
function.
price
stored in the data set? Use the code below to see the data type of price
using the typeof
function.Why is the price
stored as this data type?
See the tidyr reference page for more information about the extract
function.
price
so we can use it for regression analysis. To do so, we will first use the extract
function in tidyr package to create a column of values with the numeric values. Then we can update price
so it is a numeric data type. Fill in the code below to extract the values of the price and store them as a numeric variable.clean_airbnb <- clean_airbnb %>%
extract(price, "price") %>% #extract price values
mutate(price = ______) #change to numeric data type
✅ ⬆️ Team Member 1: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 1 - 3
Team Member 2: It’s your turn! Type the team’s response to exercises 4 - 6.
Now, let’s examine the updated variable price
. Make a plot to visualize the distribution of price. Based on the plot, how would you describe the distribution of price
?
We want to analyze Airbnbs that are most representative of those available in Nashville, so create a new data frame called clean_airbnb_300
that only includes rentals that are $300 or less per night. Make a plot to visualize the distribution of price
in the updated data frame.
You will use clean_airbnb_300
for the remainder of the lab.
Find out more about density plots on the geom_density
reference page. See the patchwork reference page for more information.
review_scores_cleanliness
. Use the patchwork function to display the plots side-by-side.
✅ ⬆️ Team Member 2: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 4 - 6.
Team Member 3: It’s your turn! Type the team’s response to exercises 7 - 9.
Fit a simple linear regression model to describe the relationship between the price and average cleanliness rating for Airbnbs in Nashville. Show the code and model output.
Let’s use ANOVA to see if there is a linear relationship between the price and cleanliness rating for Airbnbs in Nashville. Make the ANOVA table for this model. Show the code and output.
Calculate \(R^2\) and interpret it. Is this value surprising? Briefly explain why or why not.
✅ ⬆️ Team Member 3: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 7 - 9.
Team Member 4: It’s your turn! Type the team’s response to exercises 10 - 11.
Explain what the F-statistic means in the context of the data.
Based on the ANOVA test, is there a linear relationship between the average cleanliness rating and price of Airbnbs in Nashville? Briefly explain.
✅ ⬆️ Team Member 4: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the team’s completed lab!
Go back through your write up to make sure you followed the coding style guidelines we discussed in class (e.g. no long lines of code)
Team Member 2: Make any edits as needed. Then knit, commit, and push the updated documents to GitHub if you made any changes.
All other team members can click to pull the finalized document.
Team Member 3: Upload the team’s PDF to Gradescope. Be sure to include every team member’s name in the Gradescope submission Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.
There should only be one submission per team on Gradescope.