Getting started

Clone the hw-05- repo and start a new project in RStudio. For more detailed instructions about getting started, see the Lab 01 instructions.

Type the following lines of code in the console in RStudio filling in your Github username and the email address associated with your Github account.

library(usethis)
use_git_config(user.name = "your github username", 
               user.email="your email")

Packages

We will use the following packages in this assignment:

library(tidyverse)
library(broom)
library(knitr)
#add other packages as needed

Part 1: Conceptual questions

In 2009, reporter Gina Kolata reported the story “Picture Emerging on Genetic Risks of Ivf” in the New York Times. The calculated results from the study are shown below.

In November, the Centers for Disease Control and Prevention published a paper reporting that babies conceived with IVF, or with a technique in which sperm are injected directly into eggs, have a slightly increased risk of several birth defects, including a hole between the two chambers of the heart, a cleft lip or palate, an improperly developed esophagus and a malformed rectum. The study involved 9,584 babies with birth defects and 4,792 babies without. Among the mothers of babies without birth defects, 1.1% had used IVF or related methods, compared with 2.4% of mothers of babies with birth defects.

  1. Use the calculated results to construct a table displaying the relationship between the whether or not the mother used IVF and whether or not the baby had a birth defect.

  2. Calculate the odds that a baby in this study was born with a birth defect.

  3. The conclusion from the report was the following:

The findings are considered preliminary, and researchers say they believe IVF does not carry excessive risks. There is a 3% chance that any given baby will have a birth defect.

Does the data support this conclusion? Include any relevant calculations to support your answer.

Part 2: Analysis in practice

The data for Medical School Admissions was collected from undergraduates from a small liberal arts school over several years. We are interested in student attributes that are associated with higher acceptance rates.

The data set MedGPA.csv contains the following variables:

  • Accept = accepted (A) into medical school or denied (D)
  • Acceptance = accepted (1) into medical school or denied (0)
  • Sex = male (M) or female (F)
  • BCPM = GPA in natural sciences and mathematics
  • GPA = overall GPA
  • VR = verbal reasoning subscale score of the MCAT
  • PS = physical sciences subscale score of the MCAT
  • WS = writing samples subscale score of the MCAT
  • BS = biological sciences subscale score of the MCAT
  • MCAT = MCAT total score
  • Apps = number of schools applied to

Use the data to answer the following questions. Be sure your answer includes and interpretation of the model coefficients and associated hypothesis tests or confidence intervals used to support your response.

  1. Fit a model using the MCAT total score and overall GPA to predict the odds of being accepted to medical school. Use your model to compare the relative effects of improving your MCAT score versus improving your GPA on your odds of being accepted to medical school.

  2. After accounting for MCAT and GPA, is the number of applications related to odds of getting into medical school?

  3. Is there any evidence that the effect of MCAT score or GPA differs for males and females?

Grading

Total 50
Part 1: Conceptual questions 15
Part 2: Analysis in Practice 30
Document submitted as PDF with clear headers 2
Name and date updated in YAML 1
At least 3 informative commit messages 2



The exercises from this assignment were adapted from Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R by Paul Roback and Julie Legler