Hang out with the TAs from STA 210! This is a casual conversation and a fun opportunity to meet the members of the STA 210 teaching team. The only rule is these can’t turn into office hours!
Tea with a TA counts as a statistics experience.
Click here for details on the Get out the Vote! Fall Data Challenge by the American Statistical Association (ASA). Submissions are due November 11.
written-report.Rmd
file in the project repo.Today’s data is from a study where 51 untreated adult patients with acute myeloblastic leukemia who were given a course of treatment, and they were assessed as to their response to the treatment.1
The goal of today’s analysis is to use pre-treatment to predict how likely it is a patient will respond to the treatment.
We will use the following variables:
Age
: Age at diagnosis (in years)Smear
: Differential percentage of blastsInfil
: Percentage of absolute marrow leukemia infiltrateIndex
: Percentage labeling index of the bone marrow leukemia cellsBlasts
: Absolute number of blasts, in thousandsTemp
: Highest temperature of the patient prior to treatment, in degrees FahrenheitResp
: 1 = responded to treatment or 0 = failed to respondWe want to use the Age
, Index
and Temp
to predict the odds a patient will respond to the treatment. The model is below.
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 87.388 | 35.458 | 2.465 | 0.014 |
Age | -0.059 | 0.026 | -2.287 | 0.022 |
Index | 0.385 | 0.122 | 3.168 | 0.002 |
Temp | -0.089 | 0.036 | -2.467 | 0.014 |
Ultimately, we want to use the model to classify patients into two groups: those who are likely to respond to the treatment and those are not. To do so, we’ll fit an ROC curve to help us determine a decision-making threshold.
## # A tibble: 5 x 4
## .threshold specificity sensitivity pred_prob
## <dbl> <dbl> <dbl> <dbl>
## 1 -Inf 0 1 0
## 2 -6.49 0 1 0.00151
## 3 -5.35 0.0370 1 0.00474
## 4 -5.33 0.0741 1 0.00484
## 5 -5.21 0.111 1 0.00545
Suppose a doctor wants to use your model to determine if she should recommend this particular treatment for her patients with leukemia. Based on the data from the ROC curve, what decision-making threshold do you recommend the doctor use to select patients for this treatment? What factors did you consider to determine the threshold?
Check the model conditions - linearity, randomness, independence. Recall you may need to install the Stat2Data package.
The data set is from the Stat2Data R package↩︎