Discrete Choice Models
Fair's Affair data
A survey of women only was conducted in 1974 by Redbook asking about extramarital affairs.
1 2 3 4 5 6 | <span class = "kn" > from < / span> <span class = "nn" >__future__< / span> <span class = "kn" > import < / span> <span class = "n" >print_function< / span> <span class = "kn" > import < / span> <span class = "nn" >numpy< / span> <span class = "kn" >as< / span> <span class = "nn" >np< / span> <span class = "kn" > from < / span> <span class = "nn" >scipy< / span> <span class = "kn" > import < / span> <span class = "n" >stats< / span> <span class = "kn" > import < / span> <span class = "nn" >matplotlib.pyplot< / span> <span class = "kn" >as< / span> <span class = "nn" >plt< / span> <span class = "kn" > import < / span> <span class = "nn" >statsmodels.api< / span> <span class = "kn" >as< / span> <span class = "nn" >sm< / span> <span class = "kn" > from < / span> <span class = "nn" >statsmodels.formula.api< / span> <span class = "kn" > import < / span> <span class = "n" >logit< / span><span class = "p" >,< / span> <span class = "n" >probit< / span><span class = "p" >,< / span> <span class = "n" >poisson< / span><span class = "p" >,< / span> <span class = "n" >ols< / span> |
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >datasets< / span><span class = "o" >.< / span><span class = "n" >fair< / span><span class = "o" >.< / span><span class = "n" >SOURCE< / span><span class = "p" >)< / span> |
1 | <span class = "k" > print < / span><span class = "p" >(< / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >datasets< / span><span class = "o" >.< / span><span class = "n" >fair< / span><span class = "o" >.< / span><span class = "n" >NOTE< / span><span class = "p" >)< / span> |
1 | <span class = "n" >dta< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >datasets< / span><span class = "o" >.< / span><span class = "n" >fair< / span><span class = "o" >.< / span><span class = "n" >load_pandas< / span><span class = "p" >()< / span><span class = "o" >.< / span><span class = "n" >data< / span> |
1 2 | <span class = "n" >dta< / span><span class = "p" >[< / span><span class = "s" > 'affair' < / span><span class = "p" >]< / span> <span class = "o" > = < / span> <span class = "p" >(< / span><span class = "n" >dta< / span><span class = "p" >[< / span><span class = "s" > 'affairs' < / span><span class = "p" >]< / span> <span class = "o" >>< / span> <span class = "mi" > 0 < / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >astype< / span><span class = "p" >(< / span><span class = "nb" > float < / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >dta< / span><span class = "o" >.< / span><span class = "n" >head< / span><span class = "p" >(< / span><span class = "mi" > 10 < / span><span class = "p" >))< / span> |
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >dta< / span><span class = "o" >.< / span><span class = "n" >describe< / span><span class = "p" >())< / span> |
1 2 3 | <span class = "n" >affair_mod< / span> <span class = "o" > = < / span> <span class = "n" >logit< / span><span class = "p" >(< / span><span class = "s" > "affair ~ occupation + educ + occupation_husb" < / span> <span class = "s" > "+ rate_marriage + age + yrs_married + children" < / span> <span class = "s" > " + religious" < / span><span class = "p" >,< / span> <span class = "n" >dta< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> |
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >affair_mod< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
How well are we predicting?
1 | <span class = "n" >affair_mod< / span><span class = "o" >.< / span><span class = "n" >pred_table< / span><span class = "p" >()< / span> |
The coefficients of the discrete choice model do not tell us much. What we're after is marginal effects.
1 2 | <span class = "n" >mfx< / span> <span class = "o" > = < / span> <span class = "n" >affair_mod< / span><span class = "o" >.< / span><span class = "n" >get_margeff< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >mfx< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
1 2 | <span class = "n" >respondent1000< / span> <span class = "o" > = < / span> <span class = "n" >dta< / span><span class = "o" >.< / span><span class = "n" >ix< / span><span class = "p" >[< / span><span class = "mi" > 1000 < / span><span class = "p" >]< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >respondent1000< / span><span class = "p" >)< / span> |
1 2 3 4 5 6 | <span class = "n" >resp< / span> <span class = "o" > = < / span> <span class = "nb" > dict < / span><span class = "p" >(< / span><span class = "nb" > zip < / span><span class = "p" >(< / span><span class = "nb" > range < / span><span class = "p" >(< / span><span class = "mi" > 1 < / span><span class = "p" >,< / span><span class = "mi" > 9 < / span><span class = "p" >),< / span> <span class = "n" >respondent1000< / span><span class = "p" >[[< / span><span class = "s" > "occupation" < / span><span class = "p" >,< / span> <span class = "s" > "educ" < / span><span class = "p" >,< / span> <span class = "s" > "occupation_husb" < / span><span class = "p" >,< / span> <span class = "s" > "rate_marriage" < / span><span class = "p" >,< / span> <span class = "s" > "age" < / span><span class = "p" >,< / span> <span class = "s" > "yrs_married" < / span><span class = "p" >,< / span> <span class = "s" > "children" < / span><span class = "p" >,< / span> <span class = "s" > "religious" < / span><span class = "p" >]]< / span><span class = "o" >.< / span><span class = "n" >tolist< / span><span class = "p" >()))< / span> <span class = "n" >resp< / span><span class = "o" >.< / span><span class = "n" >update< / span><span class = "p" >({< / span><span class = "mi" > 0 < / span> <span class = "p" >:< / span> <span class = "mi" > 1 < / span><span class = "p" >})< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >resp< / span><span class = "p" >)< / span> |
1 2 | <span class = "n" >mfx< / span> <span class = "o" > = < / span> <span class = "n" >affair_mod< / span><span class = "o" >.< / span><span class = "n" >get_margeff< / span><span class = "p" >(< / span><span class = "n" >atexog< / span><span class = "o" > = < / span><span class = "n" >resp< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >mfx< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
1 | <span class = "n" >affair_mod< / span><span class = "o" >.< / span><span class = "n" >predict< / span><span class = "p" >(< / span><span class = "n" >respondent1000< / span><span class = "p" >)< / span> |
1 | <span class = "n" >affair_mod< / span><span class = "o" >.< / span><span class = "n" >fittedvalues< / span><span class = "p" >[< / span><span class = "mi" > 1000 < / span><span class = "p" >]< / span> |
1 | <span class = "n" >affair_mod< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >cdf< / span><span class = "p" >(< / span><span class = "n" >affair_mod< / span><span class = "o" >.< / span><span class = "n" >fittedvalues< / span><span class = "p" >[< / span><span class = "mi" > 1000 < / span><span class = "p" >])< / span> |
The "correct" model here is likely the Tobit model. We have an work in progress branch "tobit-model" on github, if anyone is interested in censored regression models.
Exercise: Logit vs Probit
1 2 3 4 5 6 | <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 12 < / span><span class = "p" >,< / span><span class = "mi" > 8 < / span><span class = "p" >))< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >fig< / span><span class = "o" >.< / span><span class = "n" >add_subplot< / span><span class = "p" >(< / span><span class = "mi" > 111 < / span><span class = "p" >)< / span> <span class = "n" >support< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >linspace< / span><span class = "p" >(< / span><span class = "o" > - < / span><span class = "mi" > 6 < / span><span class = "p" >,< / span> <span class = "mi" > 6 < / span><span class = "p" >,< / span> <span class = "mi" > 1000 < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >plot< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >logistic< / span><span class = "o" >.< / span><span class = "n" >cdf< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >),< / span> <span class = "s" > 'r-' < / span><span class = "p" >,< / span> <span class = "n" >label< / span><span class = "o" > = < / span><span class = "s" > 'Logistic' < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >plot< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >norm< / span><span class = "o" >.< / span><span class = "n" >cdf< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >),< / span> <span class = "n" >label< / span><span class = "o" > = < / span><span class = "s" > 'Probit' < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >legend< / span><span class = "p" >();< / span> |
1 2 3 4 5 6 | <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 12 < / span><span class = "p" >,< / span><span class = "mi" > 8 < / span><span class = "p" >))< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >fig< / span><span class = "o" >.< / span><span class = "n" >add_subplot< / span><span class = "p" >(< / span><span class = "mi" > 111 < / span><span class = "p" >)< / span> <span class = "n" >support< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >linspace< / span><span class = "p" >(< / span><span class = "o" > - < / span><span class = "mi" > 6 < / span><span class = "p" >,< / span> <span class = "mi" > 6 < / span><span class = "p" >,< / span> <span class = "mi" > 1000 < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >plot< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >logistic< / span><span class = "o" >.< / span><span class = "n" >pdf< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >),< / span> <span class = "s" > 'r-' < / span><span class = "p" >,< / span> <span class = "n" >label< / span><span class = "o" > = < / span><span class = "s" > 'Logistic' < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >plot< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >norm< / span><span class = "o" >.< / span><span class = "n" >pdf< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >),< / span> <span class = "n" >label< / span><span class = "o" > = < / span><span class = "s" > 'Probit' < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >legend< / span><span class = "p" >();< / span> |
Compare the estimates of the Logit Fair model above to a Probit model. Does the prediction table look better? Much difference in marginal effects?
Genarlized Linear Model Example
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >datasets< / span><span class = "o" >.< / span><span class = "n" >star98< / span><span class = "o" >.< / span><span class = "n" >SOURCE< / span><span class = "p" >)< / span> |
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >datasets< / span><span class = "o" >.< / span><span class = "n" >star98< / span><span class = "o" >.< / span><span class = "n" >DESCRLONG< / span><span class = "p" >)< / span> |
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >datasets< / span><span class = "o" >.< / span><span class = "n" >star98< / span><span class = "o" >.< / span><span class = "n" >NOTE< / span><span class = "p" >)< / span> |
1 2 | <span class = "n" >dta< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >datasets< / span><span class = "o" >.< / span><span class = "n" >star98< / span><span class = "o" >.< / span><span class = "n" >load_pandas< / span><span class = "p" >()< / span><span class = "o" >.< / span><span class = "n" >data< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >dta< / span><span class = "o" >.< / span><span class = "n" >columns< / span><span class = "p" >)< / span> |
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >dta< / span><span class = "p" >[[< / span><span class = "s" > 'NABOVE' < / span><span class = "p" >,< / span> <span class = "s" > 'NBELOW' < / span><span class = "p" >,< / span> <span class = "s" > 'LOWINC' < / span><span class = "p" >,< / span> <span class = "s" > 'PERASIAN' < / span><span class = "p" >,< / span> <span class = "s" > 'PERBLACK' < / span><span class = "p" >,< / span> <span class = "s" > 'PERHISP' < / span><span class = "p" >,< / span> <span class = "s" > 'PERMINTE' < / span><span class = "p" >]]< / span><span class = "o" >.< / span><span class = "n" >head< / span><span class = "p" >(< / span><span class = "mi" > 10 < / span><span class = "p" >))< / span> |
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >dta< / span><span class = "p" >[[< / span><span class = "s" > 'AVYRSEXP' < / span><span class = "p" >,< / span> <span class = "s" > 'AVSALK' < / span><span class = "p" >,< / span> <span class = "s" > 'PERSPENK' < / span><span class = "p" >,< / span> <span class = "s" > 'PTRATIO' < / span><span class = "p" >,< / span> <span class = "s" > 'PCTAF' < / span><span class = "p" >,< / span> <span class = "s" > 'PCTCHRT' < / span><span class = "p" >,< / span> <span class = "s" > 'PCTYRRND' < / span><span class = "p" >]]< / span><span class = "o" >.< / span><span class = "n" >head< / span><span class = "p" >(< / span><span class = "mi" > 10 < / span><span class = "p" >))< / span> |
1 2 | <span class = "n" >formula< / span> <span class = "o" > = < / span> <span class = "s" > 'NABOVE + NBELOW ~ LOWINC + PERASIAN + PERBLACK + PERHISP + PCTCHRT ' < / span> <span class = "n" >formula< / span> <span class = "o" > + = < / span> <span class = "s" > '+ PCTYRRND + PERMINTE*AVYRSEXP*AVSALK + PERSPENK*PTRATIO*PCTAF' < / span> |
Aside: Binomial distribution
Toss a six-sided die 5 times, what's the probability of exactly 2 fours?
1 | <span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >binom< / span><span class = "p" >(< / span><span class = "mi" > 5 < / span><span class = "p" >,< / span> <span class = "mf" > 1. < / span><span class = "o" > / < / span><span class = "mi" > 6 < / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >pmf< / span><span class = "p" >(< / span><span class = "mi" > 2 < / span><span class = "p" >)< / span> |
1 2 | <span class = "kn" > from < / span> <span class = "nn" >scipy.misc< / span> <span class = "kn" > import < / span> <span class = "n" >comb< / span> <span class = "n" >comb< / span><span class = "p" >(< / span><span class = "mi" > 5 < / span><span class = "p" >,< / span><span class = "mi" > 2 < / span><span class = "p" >)< / span> <span class = "o" > * < / span> <span class = "p" >(< / span><span class = "mi" > 1 < / span><span class = "o" > / < / span><span class = "mf" > 6. < / span><span class = "p" >)< / span><span class = "o" > * * < / span><span class = "mi" > 2 < / span> <span class = "o" > * < / span> <span class = "p" >(< / span><span class = "mi" > 5 < / span><span class = "o" > / < / span><span class = "mf" > 6. < / span><span class = "p" >)< / span><span class = "o" > * * < / span><span class = "mi" > 3 < / span> |
1 2 | <span class = "kn" > from < / span> <span class = "nn" >statsmodels.formula.api< / span> <span class = "kn" > import < / span> <span class = "n" >glm< / span> <span class = "n" >glm_mod< / span> <span class = "o" > = < / span> <span class = "n" >glm< / span><span class = "p" >(< / span><span class = "n" >formula< / span><span class = "p" >,< / span> <span class = "n" >dta< / span><span class = "p" >,< / span> <span class = "n" >family< / span><span class = "o" > = < / span><span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >families< / span><span class = "o" >.< / span><span class = "n" >Binomial< / span><span class = "p" >())< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> |
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
The number of trials
1 | <span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >orig_endog< / span><span class = "o" >.< / span><span class = "n" > sum < / span><span class = "p" >(< / span><span class = "mi" > 1 < / span><span class = "p" >)< / span> |
1 | <span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >fittedvalues< / span> <span class = "o" > * < / span> <span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >orig_endog< / span><span class = "o" >.< / span><span class = "n" > sum < / span><span class = "p" >(< / span><span class = "mi" > 1 < / span><span class = "p" >)< / span> |
First differences: We hold all explanatory variables constant at their means and manipulate the percentage of low income households to assess its impact on the response variables:
1 | <span class = "n" >exog< / span> <span class = "o" > = < / span> <span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >orig_exog< / span> <span class = "c" > # get the dataframe</span> |
1 2 | <span class = "n" >means25< / span> <span class = "o" > = < / span> <span class = "n" >exog< / span><span class = "o" >.< / span><span class = "n" >mean< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >means25< / span><span class = "p" >)< / span> |
1 2 | <span class = "n" >means25< / span><span class = "p" >[< / span><span class = "s" > 'LOWINC' < / span><span class = "p" >]< / span> <span class = "o" > = < / span> <span class = "n" >exog< / span><span class = "p" >[< / span><span class = "s" > 'LOWINC' < / span><span class = "p" >]< / span><span class = "o" >.< / span><span class = "n" >quantile< / span><span class = "p" >(< / span><span class = "o" >.< / span><span class = "mi" > 25 < / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >means25< / span><span class = "p" >)< / span> |
1 2 3 | <span class = "n" >means75< / span> <span class = "o" > = < / span> <span class = "n" >exog< / span><span class = "o" >.< / span><span class = "n" >mean< / span><span class = "p" >()< / span> <span class = "n" >means75< / span><span class = "p" >[< / span><span class = "s" > 'LOWINC' < / span><span class = "p" >]< / span> <span class = "o" > = < / span> <span class = "n" >exog< / span><span class = "p" >[< / span><span class = "s" > 'LOWINC' < / span><span class = "p" >]< / span><span class = "o" >.< / span><span class = "n" >quantile< / span><span class = "p" >(< / span><span class = "o" >.< / span><span class = "mi" > 75 < / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >means75< / span><span class = "p" >)< / span> |
1 2 3 | <span class = "n" >resp25< / span> <span class = "o" > = < / span> <span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >predict< / span><span class = "p" >(< / span><span class = "n" >means25< / span><span class = "p" >)< / span> <span class = "n" >resp75< / span> <span class = "o" > = < / span> <span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >predict< / span><span class = "p" >(< / span><span class = "n" >means75< / span><span class = "p" >)< / span> <span class = "n" >diff< / span> <span class = "o" > = < / span> <span class = "n" >resp75< / span> <span class = "o" > - < / span> <span class = "n" >resp25< / span> |
The interquartile first difference for the percentage of low income households in a school district is:
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "s" > "</span><span class=" si ">%2.4f%%</span><span class=" s ">" < / span> <span class = "o" > % < / span> <span class = "p" >(< / span><span class = "n" >diff< / span><span class = "p" >[< / span><span class = "mi" > 0 < / span><span class = "p" >]< / span><span class = "o" > * < / span><span class = "mi" > 100 < / span><span class = "p" >))< / span> |
1 2 3 | <span class = "n" >nobs< / span> <span class = "o" > = < / span> <span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >nobs< / span> <span class = "n" >y< / span> <span class = "o" > = < / span> <span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >endog< / span> <span class = "n" >yhat< / span> <span class = "o" > = < / span> <span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >mu< / span> |
1 2 3 4 5 6 | <span class = "kn" > from < / span> <span class = "nn" >statsmodels.graphics.api< / span> <span class = "kn" > import < / span> <span class = "n" >abline_plot< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 12 < / span><span class = "p" >,< / span><span class = "mi" > 8 < / span><span class = "p" >))< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >fig< / span><span class = "o" >.< / span><span class = "n" >add_subplot< / span><span class = "p" >(< / span><span class = "mi" > 111 < / span><span class = "p" >,< / span> <span class = "n" >ylabel< / span><span class = "o" > = < / span><span class = "s" > 'Observed Values' < / span><span class = "p" >,< / span> <span class = "n" >xlabel< / span><span class = "o" > = < / span><span class = "s" > 'Fitted Values' < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >yhat< / span><span class = "p" >,< / span> <span class = "n" >y< / span><span class = "p" >)< / span> <span class = "n" >y_vs_yhat< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >OLS< / span><span class = "p" >(< / span><span class = "n" >y< / span><span class = "p" >,< / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >add_constant< / span><span class = "p" >(< / span><span class = "n" >yhat< / span><span class = "p" >,< / span> <span class = "n" >prepend< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >))< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >model_results< / span><span class = "o" > = < / span><span class = "n" >y_vs_yhat< / span><span class = "p" >,< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >)< / span> |
Plot fitted values vs Pearson residuals
Pearson residuals are defined to be
(y−μ)√(var(μ))
where var is typically determined by the family. E.g., binomial variance is $np(1 - p)$
1 2 3 4 5 6 | <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 12 < / span><span class = "p" >,< / span><span class = "mi" > 8 < / span><span class = "p" >))< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >fig< / span><span class = "o" >.< / span><span class = "n" >add_subplot< / span><span class = "p" >(< / span><span class = "mi" > 111 < / span><span class = "p" >,< / span> <span class = "n" >title< / span><span class = "o" > = < / span><span class = "s" > 'Residual Dependence Plot' < / span><span class = "p" >,< / span> <span class = "n" >xlabel< / span><span class = "o" > = < / span><span class = "s" > 'Fitted Values' < / span><span class = "p" >,< / span> <span class = "n" >ylabel< / span><span class = "o" > = < / span><span class = "s" > 'Pearson Residuals' < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >yhat< / span><span class = "p" >,< / span> <span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >zscore< / span><span class = "p" >(< / span><span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >resid_pearson< / span><span class = "p" >))< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >axis< / span><span class = "p" >(< / span><span class = "s" > 'tight' < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >plot< / span><span class = "p" >([< / span><span class = "mf" > 0.0 < / span><span class = "p" >,< / span> <span class = "mf" > 1.0 < / span><span class = "p" >],[< / span><span class = "mf" > 0.0 < / span><span class = "p" >,< / span> <span class = "mf" > 0.0 < / span><span class = "p" >],< / span> <span class = "s" > 'k-' < / span><span class = "p" >);< / span> |
Histogram of standardized deviance residuals with Kernel Density Estimate overlayed
The definition of the deviance residuals depends on the family. For the Binomial distribution this is
rdev=sign(Y−μ)∗√2n(YlogYμ+(1−Y)log(1−Y)(1−μ)
They can be used to detect ill-fitting covariates
1 2 3 4 | <span class = "n" >resid< / span> <span class = "o" > = < / span> <span class = "n" >glm_mod< / span><span class = "o" >.< / span><span class = "n" >resid_deviance< / span> <span class = "n" >resid_std< / span> <span class = "o" > = < / span> <span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >zscore< / span><span class = "p" >(< / span><span class = "n" >resid< / span><span class = "p" >)< / span> <span class = "n" >kde_resid< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >nonparametric< / span><span class = "o" >.< / span><span class = "n" >KDEUnivariate< / span><span class = "p" >(< / span><span class = "n" >resid_std< / span><span class = "p" >)< / span> <span class = "n" >kde_resid< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> |
1 2 3 4 | <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 12 < / span><span class = "p" >,< / span><span class = "mi" > 8 < / span><span class = "p" >))< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >fig< / span><span class = "o" >.< / span><span class = "n" >add_subplot< / span><span class = "p" >(< / span><span class = "mi" > 111 < / span><span class = "p" >,< / span> <span class = "n" >title< / span><span class = "o" > = < / span><span class = "s" > "Standardized Deviance Residuals" < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >hist< / span><span class = "p" >(< / span><span class = "n" >resid_std< / span><span class = "p" >,< / span> <span class = "n" >bins< / span><span class = "o" > = < / span><span class = "mi" > 25 < / span><span class = "p" >,< / span> <span class = "n" >normed< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >);< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >plot< / span><span class = "p" >(< / span><span class = "n" >kde_resid< / span><span class = "o" >.< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >kde_resid< / span><span class = "o" >.< / span><span class = "n" >density< / span><span class = "p" >,< / span> <span class = "s" > 'r' < / span><span class = "p" >);< / span> |
QQ-plot of deviance residuals
1 2 3 | <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 12 < / span><span class = "p" >,< / span><span class = "mi" > 8 < / span><span class = "p" >))< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >fig< / span><span class = "o" >.< / span><span class = "n" >add_subplot< / span><span class = "p" >(< / span><span class = "mi" > 111 < / span><span class = "p" >)< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >graphics< / span><span class = "o" >.< / span><span class = "n" >qqplot< / span><span class = "p" >(< / span><span class = "n" >resid< / span><span class = "p" >,< / span> <span class = "n" >line< / span><span class = "o" > = < / span><span class = "s" > 'r' < / span><span class = "p" >,< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >)< / span> |
Please login to continue.