Survey

Announcements

Assignments

  • Stats Exp #2 due Nov 8 at 11:59p
    • No lecture Wednesday
  • Quiz 04 Thurs, Nov 5 - Fri, Nov 6
    • No lab Thursday

Tea with a TA!

Hang out with the TAs from STA 210! This is a casual conversation and a fun opportunity to meet the members of the STA 210 teaching team. The only rule is these can’t turn into office hours!

Tea with a TA counts as a statistics experience.

Nov 3 is Election Day!

If you’re eligible, VOTE! Find out more information: https://vote.duke.edu/

Other events

Electronic Undergraduate Statistics Research Conference (eUSR) Nov 6, 11:30a - 4:40p

Quiz 03

Q 4.2 thrown out (3 free points, +1 bonus if correct)

After fitting a logistic regression, you compute the raw residual, \(y_i - \hat{\pi}_i\), for each observation. 20% of the raw residuals are positive, and 80% are negative. Because there are far more raw residuals below zero than above zero, this logistic regression does not fit the data well.

Quiz 04

Thu, Nov 5 - Sun, Nov 8, not timed.

Basic premise: You will be given a case study scenario and a few question prompts. You will apply what you’ve learned throughout the semester to the given scenario.

You will submit your answer in narrative form.

More details will be emailed Thursday morning.

Questions about election

Click here for slides.

Exercises

Part 1: Multinomial logistic regression

Today’s data comes from an experiment by the Educational Testing Service to test the effectiveness of the children’s program Sesame Street. Sesame Street is an educational program designed to teach young children basic educational skills such as counting and the alphabet

As part of the experiment, children were assigned to one of two groups: those who were encouraged to watch the program and those who were not.

The show is only effective if children watch it, so we want to understand what effect the encouragement had on the frequency children watched the program.

Response:

  • viewcat
  • 1: rarely watched show
  • 2: once or twice a week
  • 3: three to five times a week
  • 4: watched show on average more than five times a week

Predictors:

  • age: child’s age in months
  • prenumb: score on numbers pretest (0 to 54)
  • prelet: score on letters pretest (0 to 58)
  • viewenc: 1: encouraged to watch, 0: not encouraged
  • site:
    • 1: three to five year old from urban area
    • 2: four year old from suburban area
    • 3: from rural area with high socioeconomic status
    • 4: from rural area with low socioeconomic status
    • 5: from Spanish speaking home
  1. Last time we fit a multinomial logistic model using viewenc, prenumbCent, and site to predict how frequently a child viewed Sesame Street (viewcat).
y.level term estimate std.error statistic p.value
2 (Intercept) -0.204 0.484 -0.421 0.674
2 site2 -0.069 0.774 -0.088 0.929
2 site3 -1.069 0.640 -1.670 0.095
2 site4 -1.902 0.640 -2.971 0.003
2 site5 -1.773 0.830 -2.136 0.033
2 prenumbCent 0.023 0.024 0.967 0.334
2 viewenc1 2.652 0.493 5.378 0.000
3 (Intercept) 0.050 0.467 0.108 0.914
3 site2 0.222 0.739 0.300 0.764
3 site3 -0.880 0.629 -1.399 0.162
3 site4 -2.465 0.681 -3.621 0.000
3 site5 -3.674 1.235 -2.974 0.003
3 prenumbCent 0.051 0.024 2.184 0.029
3 viewenc1 2.467 0.494 4.997 0.000
4 (Intercept) -0.273 0.499 -0.547 0.584
4 site2 0.919 0.741 1.241 0.215
4 site3 -0.645 0.663 -0.973 0.330
4 site4 -2.417 0.753 -3.211 0.001
4 site5 -1.644 0.869 -1.893 0.058
4 prenumbCent 0.067 0.023 2.831 0.005
4 viewenc1 2.291 0.501 4.575 0.000

Let’s get the predicted view category using the augment function .

Make a table to view the actual vs. predicted view categories. How well did the model perform?

Part 2: Log-linear model

The data come from the 2015 Family Income and Expenditure Survey conducted by the Philippine Statistics Authority.

The variables in the data are

  • age: the age of the head of household
  • total: the number of people in the household other than the head
  • location: where the house is located (Central Luzon, Davao Region, Ilocos Region, Metro Manila, or Visayas)
  • numLT5: the number in the household under 5 years of age
  • roof: the type of roof in the household (either Predominantly Light/Salvaged Material, or Predominantly Strong Material, where stronger material can sometimes be used as a proxy for greater wealth)

We fit the following model:

term estimate std.error statistic p.value
(Intercept) 1.436 0.017 82.339 0
ageCent -0.004 0.001 -3.584 0
I(ageCent^2) -0.001 0.000 -10.938 0
  1. Interpret the coefficient of ageCent^2 in the context of the data.

  2. Conduct a test to assess whether location is a useful predictor of the number of people in the household after accounting for age of the head of the household.


The dataset for Part 2 is from Chapter 4 of Beyond Multiple Linear Regression.