Announcements

Tea with a TA!

Hang out with the TAs from STA 210! This is a casual conversation and a fun opportunity to meet the members of the STA 210 teaching team. The only rule is these can’t turn into office hours!

Tea with a TA counts as a statistics experience.

This is Statistics Fall Data Challenge

Click here for details on the Get out the Vote! Fall Data Challenge by the American Statistical Association (ASA). Submissions are due November 11.

Big data and public policy event on Oct 26

Final project - Draft due Oct 28

  • Write the draft in the written-report.Rmd file in the project repo.
  • Draft should include
    • exploratory data analysis
    • initial model selection (main effects + interactions)
    • initial interpretations / conclusions from model

Quiz 02

Response to Leukemia treatment

Today’s data is from a study where 51 untreated adult patients with acute myeloblastic leukemia who were given a course of treatment, and they were assessed as to their response to the treatment.1

The goal of today’s analysis is to use pre-treatment to predict how likely it is a patient will respond to the treatment.

We will use the following variables:

  1. Fit a model using Age, Index and Temp to predict the odds a patient will respond to the treatment.

  2. Use the augment function to obtain the predicted probability each patient in the data set will respond to the treatment.

  3. Construct a confusion matrix using 0.5 as the decision-making threshold.

  4. Use your confusion matrix from the previous exercise to calculate the following:

  1. Create the ROC curve for this model.

  2. What is the area under the curve? Is this model a good fit for the data?

  3. Suppose a doctor wants to use your model to determine if she should recommend this particular treatment for her patients with leukemia. Based on the data from the ROC curve, what decision-making threshold do you recommend the doctor use to select patients for this treatment? What factors did you consider to determine the threshold?


  1. The data set is from the Stat2Data R package↩︎