AE 25: Missing data

Announcements

Assignments

Written report due Tue, Nov 17
Video presentation due Fri, Nov 20
- Presentation slides
Final repo due Fri, Nov 20
- Includes codebook (a list of variables and their definitions) in the data folder
Watch and comment on presentations Nov 20 - Nov 22

Exercises

Dealing with Missing Data

We will use the nhanes2 data set from the mice R package. This is a small subset of the NHANES data specifically used to demonstrate imputation methods.

library(tidyverse)
nhanes2 <- mice::nhanes2

bmi

Let’s take a look at the variable bmi (body mass index).
- How many observations have missing values for bmi?
- Visualize the distribution of bmi.
- What is the standard deviation of bmi for the observations that have values for bmi?
Impute the missing values of bmi using mean imputation.
Visualize the distribution of bmi with the imputed values and calculate the standard deviation. How did the distribution of bmi change when we filled in missing values using mean imputation?
What are some potential limitations of using mean imputation to fill in missing values?

hyp

Let’s consider the variable hyp (hypertension). How many observations have missing values for hyp?
What are two strategies you can use to impute values for hyp?
What are the advantages and potential limitations of the strategies you proposed?

AE 25: Missing data

2020-11-09

Announcements

Assignments

Exercises

Dealing with Missing Data

bmi

hyp