Announcements

Tea with a TA!

Hang out with the TAs from STA 210! This is a casual conversation and a fun opportunity to meet the members of the STA 210 teaching team. The only rule is these can’t turn into office hours!

Tea with a TA counts as a statistics experience.

This is Statistics Fall Data Challenge

Click here for details on the Get out the Vote! Fall Data Challenge by the American Statistical Association (ASA). Submissions are due November 11.

Big data and public policy event on Oct 26

Final project - Draft due Oct 28

  • Write the draft in the written-report.Rmd file in the project repo.
  • Draft should include
    • exploratory data analysis
    • initial model selection (main effects + interactions)
    • initial interpretations / conclusions from model

Quiz 03

  • Available Thu, Oct 22 at 12a - Fri, Oct 23 11:59p
  • 2 hours to complete it
  • Covers material weeks 07 - 09
  • No office hours Oct 22 - Oct 23. Piazza will be inactive while the quiz is out.
  • No lab this week.

Response to Leukemia treatment

Today’s data is from a study where 51 untreated adult patients with acute myeloblastic leukemia who were given a course of treatment, and they were assessed as to their response to the treatment.1

The goal of today’s analysis is to use pre-treatment to predict how likely it is a patient will respond to the treatment.

We will use the following variables:

We want to use the Age, Index and Temp to predict the odds a patient will respond to the treatment. The model is below.

term estimate std.error statistic p.value
(Intercept) 87.388 35.458 2.465 0.014
Age -0.059 0.026 -2.287 0.022
Index 0.385 0.122 3.168 0.002
Temp -0.089 0.036 -2.467 0.014

Ultimately, we want to use the model to classify patients into two groups: those who are likely to respond to the treatment and those are not. To do so, we’ll fit an ROC curve to help us determine a decision-making threshold.

## # A tibble: 5 x 4
##   .threshold specificity sensitivity pred_prob
##        <dbl>       <dbl>       <dbl>     <dbl>
## 1    -Inf         0                1   0      
## 2      -6.49      0                1   0.00151
## 3      -5.35      0.0370           1   0.00474
## 4      -5.33      0.0741           1   0.00484
## 5      -5.21      0.111            1   0.00545

  1. Suppose a doctor wants to use your model to determine if she should recommend this particular treatment for her patients with leukemia. Based on the data from the ROC curve, what decision-making threshold do you recommend the doctor use to select patients for this treatment? What factors did you consider to determine the threshold?

  2. Check the model conditions - linearity, randomness, independence. Recall you may need to install the Stat2Data package.


  1. The data set is from the Stat2Data R package↩︎