AE 23: Multinomial logistic regression + Log-linear models

Survey

Announcements

Assignments

Stats Exp #2 due Nov 8 at 11:59p
- No lecture Wednesday
Quiz 04 Thurs, Nov 5 - Fri, Nov 6
- No lab Thursday

Tea with a TA!

Hang out with the TAs from STA 210! This is a casual conversation and a fun opportunity to meet the members of the STA 210 teaching team. The only rule is these can’t turn into office hours!

Tea with a TA counts as a statistics experience.

Cody Coombs, Thu, Nov 5, 4p - 5p
- Click here to sign up. Zoom details will be emailed before the event.

Nov 3 is Election Day!

If you’re eligible, VOTE! Find out more information: https://vote.duke.edu/

Other events

Electronic Undergraduate Statistics Research Conference (eUSR) Nov 6, 11:30a - 4:40p

Click here to register

Quiz 03

Q 4.2 thrown out (3 free points, +1 bonus if correct)

After fitting a logistic regression, you compute the raw residual, , for each observation. 20% of the raw residuals are positive, and 80% are negative. Because there are far more raw residuals below zero than above zero, this logistic regression does not fit the data well.

Quiz 04

Thu, Nov 5 - Sun, Nov 8, not timed.

Basic premise: You will be given a case study scenario and a few question prompts. You will apply what you’ve learned throughout the semester to the given scenario.

You will submit your answer in narrative form.

More details will be emailed Thursday morning.

Questions about election

Click here for slides.

Exercises

Part 1: Multinomial logistic regression

Today’s data comes from an experiment by the Educational Testing Service to test the effectiveness of the children’s program Sesame Street. Sesame Street is an educational program designed to teach young children basic educational skills such as counting and the alphabet

As part of the experiment, children were assigned to one of two groups: those who were encouraged to watch the program and those who were not.

The show is only effective if children watch it, so we want to understand what effect the encouragement had on the frequency children watched the program.

Response:

viewcat
1: rarely watched show
2: once or twice a week
3: three to five times a week
4: watched show on average more than five times a week

Predictors:

age: child’s age in months
prenumb: score on numbers pretest (0 to 54)
prelet: score on letters pretest (0 to 58)
viewenc: 1: encouraged to watch, 0: not encouraged
site:
- 1: three to five year old from urban area
- 2: four year old from suburban area
- 3: from rural area with high socioeconomic status
- 4: from rural area with low socioeconomic status
- 5: from Spanish speaking home

Last time we fit a multinomial logistic model using viewenc, prenumbCent, and site to predict how frequently a child viewed Sesame Street (viewcat).

y.level	term	estimate	std.error	statistic	p.value
2	(Intercept)	-0.204	0.484	-0.421	0.674
2	site2	-0.069	0.774	-0.088	0.929
2	site3	-1.069	0.640	-1.670	0.095
2	site4	-1.902	0.640	-2.971	0.003
2	site5	-1.773	0.830	-2.136	0.033
2	prenumbCent	0.023	0.024	0.967	0.334
2	viewenc1	2.652	0.493	5.378	0.000
3	(Intercept)	0.050	0.467	0.108	0.914
3	site2	0.222	0.739	0.300	0.764
3	site3	-0.880	0.629	-1.399	0.162
3	site4	-2.465	0.681	-3.621	0.000
3	site5	-3.674	1.235	-2.974	0.003
3	prenumbCent	0.051	0.024	2.184	0.029
3	viewenc1	2.467	0.494	4.997	0.000
4	(Intercept)	-0.273	0.499	-0.547	0.584
4	site2	0.919	0.741	1.241	0.215
4	site3	-0.645	0.663	-0.973	0.330
4	site4	-2.417	0.753	-3.211	0.001
4	site5	-1.644	0.869	-1.893	0.058
4	prenumbCent	0.067	0.023	2.831	0.005
4	viewenc1	2.291	0.501	4.575	0.000

Let’s get the predicted view category using the augment function .

Make a table to view the actual vs. predicted view categories. How well did the model perform?

Part 2: Log-linear model

The data come from the 2015 Family Income and Expenditure Survey conducted by the Philippine Statistics Authority.

The variables in the data are

age: the age of the head of household
total: the number of people in the household other than the head
location: where the house is located (Central Luzon, Davao Region, Ilocos Region, Metro Manila, or Visayas)
numLT5: the number in the household under 5 years of age
roof: the type of roof in the household (either Predominantly Light/Salvaged Material, or Predominantly Strong Material, where stronger material can sometimes be used as a proxy for greater wealth)

We fit the following model:

term	estimate	std.error	statistic
(Intercept)	1.436	0.017	82.339
ageCent	-0.004	0.001	-3.584
I(ageCent^2)	-0.001	0.000	-10.938

Interpret the coefficient of ageCent^2 in the context of the data.
Conduct a test to assess whether location is a useful predictor of the number of people in the household after accounting for age of the head of the household.

The dataset for Part 2 is from Chapter 4 of Beyond Multiple Linear Regression.