data
folderWe will use the nhanes2
data set from the mice R package. This is a small subset of the NHANES data specifically used to demonstrate imputation methods.
library(tidyverse)
nhanes2 <- mice::nhanes2
Let’s take a look at the variable bmi
(body mass index).
bmi
?bmi
.bmi
for the observations that have values for bmi
?Impute the missing values of bmi
using mean imputation.
Visualize the distribution of bmi
with the imputed values and calculate the standard deviation. How did the distribution of bmi
change when we filled in missing values using mean imputation?
What are some potential limitations of using mean imputation to fill in missing values?
Let’s consider the variable hyp
(hypertension). How many observations have missing values for hyp
?
What are two strategies you can use to impute values for hyp
?
What are the advantages and potential limitations of the strategies you proposed?