Hang out with the TAs from STA 210! This is a casual conversation and a fun opportunity to meet the members of the STA 210 teaching team. The only rule is these can’t turn into office hours!
Tea with a TA counts as a statistics experience.
Click here for details on the Get out the Vote! Fall Data Challenge by the American Statistical Association (ASA). Submissions are due November 11.
written-report.Rmd
file in the project repo.Today’s data is from a study where 51 untreated adult patients with acute myeloblastic leukemia who were given a course of treatment, and they were assessed as to their response to the treatment.1
The goal of today’s analysis is to use pre-treatment to predict how likely it is a patient will respond to the treatment.
We will use the following variables:
Age
: Age at diagnosis (in years)Smear
: Differential percentage of blastsInfil
: Percentage of absolute marrow leukemia infiltrateIndex
: Percentage labeling index of the bone marrow leukemia cellsBlasts
: Absolute number of blasts, in thousandsTemp
: Highest temperature of the patient prior to treatment, in degrees FahrenheitResp
: 1 = responded to treatment or 0 = failed to respondFit a model using Age
, Index
and Temp
to predict the odds a patient will respond to the treatment.
Use the augment function to obtain the predicted probability each patient in the data set will respond to the treatment.
Construct a confusion matrix using 0.5 as the decision-making threshold.
Use your confusion matrix from the previous exercise to calculate the following:
Create the ROC curve for this model.
What is the area under the curve? Is this model a good fit for the data?
Suppose a doctor wants to use your model to determine if she should recommend this particular treatment for her patients with leukemia. Based on the data from the ROC curve, what decision-making threshold do you recommend the doctor use to select patients for this treatment? What factors did you consider to determine the threshold?
The data set is from the Stat2Data R package↩︎