Interactions and ANOVA
Note: This script is based heavily on Jonathan Taylor's class notes http://www.stanford.edu/class/stats191/interactions.html
Download and format data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | <span class = "kn" > from < / span> <span class = "nn" >__future__< / span> <span class = "kn" > import < / span> <span class = "n" >print_function< / span> <span class = "kn" > from < / span> <span class = "nn" >statsmodels.compat< / span> <span class = "kn" > import < / span> <span class = "n" >urlopen< / span> <span class = "kn" > import < / span> <span class = "nn" >numpy< / span> <span class = "kn" >as< / span> <span class = "nn" >np< / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >set_printoptions< / span><span class = "p" >(< / span><span class = "n" >precision< / span><span class = "o" > = < / span><span class = "mi" > 4 < / span><span class = "p" >,< / span> <span class = "n" >suppress< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >)< / span> <span class = "kn" > import < / span> <span class = "nn" >statsmodels.api< / span> <span class = "kn" >as< / span> <span class = "nn" >sm< / span> <span class = "kn" > import < / span> <span class = "nn" >pandas< / span> <span class = "kn" >as< / span> <span class = "nn" >pd< / span> <span class = "n" >pd< / span><span class = "o" >.< / span><span class = "n" >set_option< / span><span class = "p" >(< / span><span class = "s" > "display.width" < / span><span class = "p" >,< / span> <span class = "mi" > 100 < / span><span class = "p" >)< / span> <span class = "kn" > import < / span> <span class = "nn" >matplotlib.pyplot< / span> <span class = "kn" >as< / span> <span class = "nn" >plt< / span> <span class = "kn" > from < / span> <span class = "nn" >statsmodels.formula.api< / span> <span class = "kn" > import < / span> <span class = "n" >ols< / span> <span class = "kn" > from < / span> <span class = "nn" >statsmodels.graphics.api< / span> <span class = "kn" > import < / span> <span class = "n" >interaction_plot< / span><span class = "p" >,< / span> <span class = "n" >abline_plot< / span> <span class = "kn" > from < / span> <span class = "nn" >statsmodels.stats.anova< / span> <span class = "kn" > import < / span> <span class = "n" >anova_lm< / span> <span class = "k" > try < / span><span class = "p" >:< / span> <span class = "n" >salary_table< / span> <span class = "o" > = < / span> <span class = "n" >pd< / span><span class = "o" >.< / span><span class = "n" >read_csv< / span><span class = "p" >(< / span><span class = "s" > 'salary.table' < / span><span class = "p" >)< / span> <span class = "k" > except < / span><span class = "p" >:< / span> <span class = "c" > # recent pandas can read URL without urlopen</span> <span class = "n" >url< / span> <span class = "o" > = < / span> <span class = "s" > 'http://stats191.stanford.edu/data/salary.table' < / span> <span class = "n" >fh< / span> <span class = "o" > = < / span> <span class = "n" >urlopen< / span><span class = "p" >(< / span><span class = "n" >url< / span><span class = "p" >)< / span> <span class = "n" >salary_table< / span> <span class = "o" > = < / span> <span class = "n" >pd< / span><span class = "o" >.< / span><span class = "n" >read_table< / span><span class = "p" >(< / span><span class = "n" >fh< / span><span class = "p" >)< / span> <span class = "n" >salary_table< / span><span class = "o" >.< / span><span class = "n" >to_csv< / span><span class = "p" >(< / span><span class = "s" > 'salary.table' < / span><span class = "p" >)< / span> <span class = "n" >E< / span> <span class = "o" > = < / span> <span class = "n" >salary_table< / span><span class = "o" >.< / span><span class = "n" >E< / span> <span class = "n" >M< / span> <span class = "o" > = < / span> <span class = "n" >salary_table< / span><span class = "o" >.< / span><span class = "n" >M< / span> <span class = "n" >X< / span> <span class = "o" > = < / span> <span class = "n" >salary_table< / span><span class = "o" >.< / span><span class = "n" >X< / span> <span class = "n" >S< / span> <span class = "o" > = < / span> <span class = "n" >salary_table< / span><span class = "o" >.< / span><span class = "n" >S< / span> |
Take a look at the data:
1 2 3 4 5 6 7 8 9 10 | <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >))< / span> <span class = "n" >symbols< / span> <span class = "o" > = < / span> <span class = "p" >[< / span><span class = "s" > 'D' < / span><span class = "p" >,< / span> <span class = "s" > '^' < / span><span class = "p" >]< / span> <span class = "n" >colors< / span> <span class = "o" > = < / span> <span class = "p" >[< / span><span class = "s" > 'r' < / span><span class = "p" >,< / span> <span class = "s" > 'g' < / span><span class = "p" >,< / span> <span class = "s" > 'blue' < / span><span class = "p" >]< / span> <span class = "n" >factor_groups< / span> <span class = "o" > = < / span> <span class = "n" >salary_table< / span><span class = "o" >.< / span><span class = "n" >groupby< / span><span class = "p" >([< / span><span class = "s" > 'E' < / span><span class = "p" >,< / span><span class = "s" > 'M' < / span><span class = "p" >])< / span> <span class = "k" > for < / span> <span class = "n" >values< / span><span class = "p" >,< / span> <span class = "n" >group< / span> <span class = "ow" > in < / span> <span class = "n" >factor_groups< / span><span class = "p" >:< / span> <span class = "n" >i< / span><span class = "p" >,< / span><span class = "n" >j< / span> <span class = "o" > = < / span> <span class = "n" >values< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'X' < / span><span class = "p" >],< / span> <span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'S' < / span><span class = "p" >],< / span> <span class = "n" >marker< / span><span class = "o" > = < / span><span class = "n" >symbols< / span><span class = "p" >[< / span><span class = "n" >j< / span><span class = "p" >],< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "n" >colors< / span><span class = "p" >[< / span><span class = "n" >i< / span><span class = "o" > - < / span><span class = "mi" > 1 < / span><span class = "p" >],< / span> <span class = "n" >s< / span><span class = "o" > = < / span><span class = "mi" > 144 < / span><span class = "p" >)< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >xlabel< / span><span class = "p" >(< / span><span class = "s" > 'Experience' < / span><span class = "p" >);< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >ylabel< / span><span class = "p" >(< / span><span class = "s" > 'Salary' < / span><span class = "p" >);< / span> |
Fit a linear model:
1 2 3 | <span class = "n" >formula< / span> <span class = "o" > = < / span> <span class = "s" > 'S ~ C(E) + C(M) + X' < / span> <span class = "n" >lm< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "n" >formula< / span><span class = "p" >,< / span> <span class = "n" >salary_table< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >lm< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
Have a look at the created design matrix:
1 | <span class = "n" >lm< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >exog< / span><span class = "p" >[:< / span><span class = "mi" > 5 < / span><span class = "p" >]< / span> |
Or since we initially passed in a DataFrame, we have a DataFrame available in
1 | <span class = "n" >lm< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >orig_exog< / span><span class = "p" >[:< / span><span class = "mi" > 5 < / span><span class = "p" >]< / span> |
We keep a reference to the original untouched data in
1 | <span class = "n" >lm< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >frame< / span><span class = "p" >[:< / span><span class = "mi" > 5 < / span><span class = "p" >]< / span> |
Influence statistics
1 2 | <span class = "n" >infl< / span> <span class = "o" > = < / span> <span class = "n" >lm< / span><span class = "o" >.< / span><span class = "n" >get_influence< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >infl< / span><span class = "o" >.< / span><span class = "n" >summary_table< / span><span class = "p" >())< / span> |
or get a dataframe
1 | <span class = "n" >df_infl< / span> <span class = "o" > = < / span> <span class = "n" >infl< / span><span class = "o" >.< / span><span class = "n" >summary_frame< / span><span class = "p" >()< / span> |
1 | <span class = "n" >df_infl< / span><span class = "p" >[:< / span><span class = "mi" > 5 < / span><span class = "p" >]< / span> |
Now plot the reiduals within the groups separately:
1 2 3 4 5 6 7 8 9 10 | <span class = "n" >resid< / span> <span class = "o" > = < / span> <span class = "n" >lm< / span><span class = "o" >.< / span><span class = "n" >resid< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >));< / span> <span class = "k" > for < / span> <span class = "n" >values< / span><span class = "p" >,< / span> <span class = "n" >group< / span> <span class = "ow" > in < / span> <span class = "n" >factor_groups< / span><span class = "p" >:< / span> <span class = "n" >i< / span><span class = "p" >,< / span><span class = "n" >j< / span> <span class = "o" > = < / span> <span class = "n" >values< / span> <span class = "n" >group_num< / span> <span class = "o" > = < / span> <span class = "n" >i< / span><span class = "o" > * < / span><span class = "mi" > 2 < / span> <span class = "o" > + < / span> <span class = "n" >j< / span> <span class = "o" > - < / span> <span class = "mi" > 1 < / span> <span class = "c" > # for plotting purposes</span> <span class = "n" >x< / span> <span class = "o" > = < / span> <span class = "p" >[< / span><span class = "n" >group_num< / span><span class = "p" >]< / span> <span class = "o" > * < / span> <span class = "nb" > len < / span><span class = "p" >(< / span><span class = "n" >group< / span><span class = "p" >)< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >x< / span><span class = "p" >,< / span> <span class = "n" >resid< / span><span class = "p" >[< / span><span class = "n" >group< / span><span class = "o" >.< / span><span class = "n" >index< / span><span class = "p" >],< / span> <span class = "n" >marker< / span><span class = "o" > = < / span><span class = "n" >symbols< / span><span class = "p" >[< / span><span class = "n" >j< / span><span class = "p" >],< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "n" >colors< / span><span class = "p" >[< / span><span class = "n" >i< / span><span class = "o" > - < / span><span class = "mi" > 1 < / span><span class = "p" >],< / span> <span class = "n" >s< / span><span class = "o" > = < / span><span class = "mi" > 144 < / span><span class = "p" >,< / span> <span class = "n" >edgecolors< / span><span class = "o" > = < / span><span class = "s" > 'black' < / span><span class = "p" >)< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >xlabel< / span><span class = "p" >(< / span><span class = "s" > 'Group' < / span><span class = "p" >);< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >ylabel< / span><span class = "p" >(< / span><span class = "s" > 'Residuals' < / span><span class = "p" >);< / span> |
Now we will test some interactions using anova or f_test
1 2 | <span class = "n" >interX_lm< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > "S ~ C(E) * X + C(M)" < / span><span class = "p" >,< / span> <span class = "n" >salary_table< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >interX_lm< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
Do an ANOVA check
1 2 3 4 5 6 7 8 9 10 | <span class = "kn" > from < / span> <span class = "nn" >statsmodels.stats.api< / span> <span class = "kn" > import < / span> <span class = "n" >anova_lm< / span> <span class = "n" >table1< / span> <span class = "o" > = < / span> <span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >lm< / span><span class = "p" >,< / span> <span class = "n" >interX_lm< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >table1< / span><span class = "p" >)< / span> <span class = "n" >interM_lm< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > "S ~ X + C(E)*C(M)" < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >salary_table< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >interM_lm< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> <span class = "n" >table2< / span> <span class = "o" > = < / span> <span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >lm< / span><span class = "p" >,< / span> <span class = "n" >interM_lm< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >table2< / span><span class = "p" >)< / span> |
The design matrix as a DataFrame
1 | <span class = "n" >interM_lm< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >orig_exog< / span><span class = "p" >[:< / span><span class = "mi" > 5 < / span><span class = "p" >]< / span> |
The design matrix as an ndarray
1 2 | <span class = "n" >interM_lm< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >exog< / span> <span class = "n" >interM_lm< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >exog_names< / span> |
1 2 3 4 5 6 7 8 9 10 | <span class = "n" >infl< / span> <span class = "o" > = < / span> <span class = "n" >interM_lm< / span><span class = "o" >.< / span><span class = "n" >get_influence< / span><span class = "p" >()< / span> <span class = "n" >resid< / span> <span class = "o" > = < / span> <span class = "n" >infl< / span><span class = "o" >.< / span><span class = "n" >resid_studentized_internal< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >))< / span> <span class = "k" > for < / span> <span class = "n" >values< / span><span class = "p" >,< / span> <span class = "n" >group< / span> <span class = "ow" > in < / span> <span class = "n" >factor_groups< / span><span class = "p" >:< / span> <span class = "n" >i< / span><span class = "p" >,< / span><span class = "n" >j< / span> <span class = "o" > = < / span> <span class = "n" >values< / span> <span class = "n" >idx< / span> <span class = "o" > = < / span> <span class = "n" >group< / span><span class = "o" >.< / span><span class = "n" >index< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >X< / span><span class = "p" >[< / span><span class = "n" >idx< / span><span class = "p" >],< / span> <span class = "n" >resid< / span><span class = "p" >[< / span><span class = "n" >idx< / span><span class = "p" >],< / span> <span class = "n" >marker< / span><span class = "o" > = < / span><span class = "n" >symbols< / span><span class = "p" >[< / span><span class = "n" >j< / span><span class = "p" >],< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "n" >colors< / span><span class = "p" >[< / span><span class = "n" >i< / span><span class = "o" > - < / span><span class = "mi" > 1 < / span><span class = "p" >],< / span> <span class = "n" >s< / span><span class = "o" > = < / span><span class = "mi" > 144 < / span><span class = "p" >,< / span> <span class = "n" >edgecolors< / span><span class = "o" > = < / span><span class = "s" > 'black' < / span><span class = "p" >)< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >xlabel< / span><span class = "p" >(< / span><span class = "s" > 'X' < / span><span class = "p" >);< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >ylabel< / span><span class = "p" >(< / span><span class = "s" > 'standardized resids' < / span><span class = "p" >);< / span> |
Looks like one observation is an outlier.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | <span class = "n" >drop_idx< / span> <span class = "o" > = < / span> <span class = "nb" > abs < / span><span class = "p" >(< / span><span class = "n" >resid< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >argmax< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >drop_idx< / span><span class = "p" >)< / span> <span class = "c" > # zero-based index</span> <span class = "n" >idx< / span> <span class = "o" > = < / span> <span class = "n" >salary_table< / span><span class = "o" >.< / span><span class = "n" >index< / span><span class = "o" >.< / span><span class = "n" >drop< / span><span class = "p" >(< / span><span class = "n" >drop_idx< / span><span class = "p" >)< / span> <span class = "n" >lm32< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'S ~ C(E) + X + C(M)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >salary_table< / span><span class = "p" >,< / span> <span class = "n" >subset< / span><span class = "o" > = < / span><span class = "n" >idx< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >lm32< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "s" > '</span><span class="se">\n</span><span class="s">' < / span><span class = "p" >)< / span> <span class = "n" >interX_lm32< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'S ~ C(E) * X + C(M)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >salary_table< / span><span class = "p" >,< / span> <span class = "n" >subset< / span><span class = "o" > = < / span><span class = "n" >idx< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >interX_lm32< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "s" > '</span><span class="se">\n</span><span class="s">' < / span><span class = "p" >)< / span> <span class = "n" >table3< / span> <span class = "o" > = < / span> <span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >lm32< / span><span class = "p" >,< / span> <span class = "n" >interX_lm32< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >table3< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "s" > '</span><span class="se">\n</span><span class="s">' < / span><span class = "p" >)< / span> <span class = "n" >interM_lm32< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'S ~ X + C(E) * C(M)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >salary_table< / span><span class = "p" >,< / span> <span class = "n" >subset< / span><span class = "o" > = < / span><span class = "n" >idx< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "n" >table4< / span> <span class = "o" > = < / span> <span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >lm32< / span><span class = "p" >,< / span> <span class = "n" >interM_lm32< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >table4< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "s" > '</span><span class="se">\n</span><span class="s">' < / span><span class = "p" >)< / span> |
Replot the residuals
1 2 3 4 5 6 7 8 9 10 11 12 13 | <span class = "k" > try < / span><span class = "p" >:< / span> <span class = "n" >resid< / span> <span class = "o" > = < / span> <span class = "n" >interM_lm32< / span><span class = "o" >.< / span><span class = "n" >get_influence< / span><span class = "p" >()< / span><span class = "o" >.< / span><span class = "n" >summary_frame< / span><span class = "p" >()[< / span><span class = "s" > 'standard_resid' < / span><span class = "p" >]< / span> <span class = "k" > except < / span><span class = "p" >:< / span> <span class = "n" >resid< / span> <span class = "o" > = < / span> <span class = "n" >interM_lm32< / span><span class = "o" >.< / span><span class = "n" >get_influence< / span><span class = "p" >()< / span><span class = "o" >.< / span><span class = "n" >summary_frame< / span><span class = "p" >()[< / span><span class = "s" > 'standard_resid' < / span><span class = "p" >]< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >))< / span> <span class = "k" > for < / span> <span class = "n" >values< / span><span class = "p" >,< / span> <span class = "n" >group< / span> <span class = "ow" > in < / span> <span class = "n" >factor_groups< / span><span class = "p" >:< / span> <span class = "n" >i< / span><span class = "p" >,< / span><span class = "n" >j< / span> <span class = "o" > = < / span> <span class = "n" >values< / span> <span class = "n" >idx< / span> <span class = "o" > = < / span> <span class = "n" >group< / span><span class = "o" >.< / span><span class = "n" >index< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >X< / span><span class = "p" >[< / span><span class = "n" >idx< / span><span class = "p" >],< / span> <span class = "n" >resid< / span><span class = "p" >[< / span><span class = "n" >idx< / span><span class = "p" >],< / span> <span class = "n" >marker< / span><span class = "o" > = < / span><span class = "n" >symbols< / span><span class = "p" >[< / span><span class = "n" >j< / span><span class = "p" >],< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "n" >colors< / span><span class = "p" >[< / span><span class = "n" >i< / span><span class = "o" > - < / span><span class = "mi" > 1 < / span><span class = "p" >],< / span> <span class = "n" >s< / span><span class = "o" > = < / span><span class = "mi" > 144 < / span><span class = "p" >,< / span> <span class = "n" >edgecolors< / span><span class = "o" > = < / span><span class = "s" > 'black' < / span><span class = "p" >)< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >xlabel< / span><span class = "p" >(< / span><span class = "s" > 'X[~[32]]' < / span><span class = "p" >);< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >ylabel< / span><span class = "p" >(< / span><span class = "s" > 'standardized resids' < / span><span class = "p" >);< / span> |
Plot the fitted values
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | <span class = "n" >lm_final< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'S ~ X + C(E)*C(M)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span> <span class = "o" > = < / span> <span class = "n" >salary_table< / span><span class = "o" >.< / span><span class = "n" >drop< / span><span class = "p" >([< / span><span class = "n" >drop_idx< / span><span class = "p" >]))< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "n" >mf< / span> <span class = "o" > = < / span> <span class = "n" >lm_final< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >orig_exog< / span> <span class = "n" >lstyle< / span> <span class = "o" > = < / span> <span class = "p" >[< / span><span class = "s" > '-' < / span><span class = "p" >,< / span><span class = "s" > '--' < / span><span class = "p" >]< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >))< / span> <span class = "k" > for < / span> <span class = "n" >values< / span><span class = "p" >,< / span> <span class = "n" >group< / span> <span class = "ow" > in < / span> <span class = "n" >factor_groups< / span><span class = "p" >:< / span> <span class = "n" >i< / span><span class = "p" >,< / span><span class = "n" >j< / span> <span class = "o" > = < / span> <span class = "n" >values< / span> <span class = "n" >idx< / span> <span class = "o" > = < / span> <span class = "n" >group< / span><span class = "o" >.< / span><span class = "n" >index< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >X< / span><span class = "p" >[< / span><span class = "n" >idx< / span><span class = "p" >],< / span> <span class = "n" >S< / span><span class = "p" >[< / span><span class = "n" >idx< / span><span class = "p" >],< / span> <span class = "n" >marker< / span><span class = "o" > = < / span><span class = "n" >symbols< / span><span class = "p" >[< / span><span class = "n" >j< / span><span class = "p" >],< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "n" >colors< / span><span class = "p" >[< / span><span class = "n" >i< / span><span class = "o" > - < / span><span class = "mi" > 1 < / span><span class = "p" >],< / span> <span class = "n" >s< / span><span class = "o" > = < / span><span class = "mi" > 144 < / span><span class = "p" >,< / span> <span class = "n" >edgecolors< / span><span class = "o" > = < / span><span class = "s" > 'black' < / span><span class = "p" >)< / span> <span class = "c" > # drop NA because there is no idx 32 in the final model</span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >plot< / span><span class = "p" >(< / span><span class = "n" >mf< / span><span class = "o" >.< / span><span class = "n" >X< / span><span class = "p" >[< / span><span class = "n" >idx< / span><span class = "p" >]< / span><span class = "o" >.< / span><span class = "n" >dropna< / span><span class = "p" >(),< / span> <span class = "n" >lm_final< / span><span class = "o" >.< / span><span class = "n" >fittedvalues< / span><span class = "p" >[< / span><span class = "n" >idx< / span><span class = "p" >]< / span><span class = "o" >.< / span><span class = "n" >dropna< / span><span class = "p" >(),< / span> <span class = "n" >ls< / span><span class = "o" > = < / span><span class = "n" >lstyle< / span><span class = "p" >[< / span><span class = "n" >j< / span><span class = "p" >],< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "n" >colors< / span><span class = "p" >[< / span><span class = "n" >i< / span><span class = "o" > - < / span><span class = "mi" > 1 < / span><span class = "p" >])< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >xlabel< / span><span class = "p" >(< / span><span class = "s" > 'Experience' < / span><span class = "p" >);< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >ylabel< / span><span class = "p" >(< / span><span class = "s" > 'Salary' < / span><span class = "p" >);< / span> |
From our first look at the data, the difference between Master's and PhD in the management group is different than in the non-management group. This is an interaction between the two qualitative variables management,M and education,E. We can visualize this by first removing the effect of experience, then plotting the means within each of the 6 groups using interaction.plot.
1 2 3 4 5 | <span class = "n" >U< / span> <span class = "o" > = < / span> <span class = "n" >S< / span> <span class = "o" > - < / span> <span class = "n" >X< / span> <span class = "o" > * < / span> <span class = "n" >interX_lm32< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'X' < / span><span class = "p" >]< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >))< / span> <span class = "n" >interaction_plot< / span><span class = "p" >(< / span><span class = "n" >E< / span><span class = "p" >,< / span> <span class = "n" >M< / span><span class = "p" >,< / span> <span class = "n" >U< / span><span class = "p" >,< / span> <span class = "n" >colors< / span><span class = "o" > = < / span><span class = "p" >[< / span><span class = "s" > 'red' < / span><span class = "p" >,< / span><span class = "s" > 'blue' < / span><span class = "p" >],< / span> <span class = "n" >markers< / span><span class = "o" > = < / span><span class = "p" >[< / span><span class = "s" > '^' < / span><span class = "p" >,< / span><span class = "s" > 'D' < / span><span class = "p" >],< / span> <span class = "n" >markersize< / span><span class = "o" > = < / span><span class = "mi" > 10 < / span><span class = "p" >,< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >gca< / span><span class = "p" >())< / span> |
Minority Employment Data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <span class = "k" > try < / span><span class = "p" >:< / span> <span class = "n" >minority_table< / span> <span class = "o" > = < / span> <span class = "n" >pd< / span><span class = "o" >.< / span><span class = "n" >read_table< / span><span class = "p" >(< / span><span class = "s" > 'minority.table' < / span><span class = "p" >)< / span> <span class = "k" > except < / span><span class = "p" >:< / span> <span class = "c" > # don't have data already</span> <span class = "n" >url< / span> <span class = "o" > = < / span> <span class = "s" > 'http://stats191.stanford.edu/data/minority.table' < / span> <span class = "n" >minority_table< / span> <span class = "o" > = < / span> <span class = "n" >pd< / span><span class = "o" >.< / span><span class = "n" >read_table< / span><span class = "p" >(< / span><span class = "n" >url< / span><span class = "p" >)< / span> <span class = "n" >factor_group< / span> <span class = "o" > = < / span> <span class = "n" >minority_table< / span><span class = "o" >.< / span><span class = "n" >groupby< / span><span class = "p" >([< / span><span class = "s" > 'ETHN' < / span><span class = "p" >])< / span> <span class = "n" >fig< / span><span class = "p" >,< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >subplots< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >))< / span> <span class = "n" >colors< / span> <span class = "o" > = < / span> <span class = "p" >[< / span><span class = "s" > 'purple' < / span><span class = "p" >,< / span> <span class = "s" > 'green' < / span><span class = "p" >]< / span> <span class = "n" >markers< / span> <span class = "o" > = < / span> <span class = "p" >[< / span><span class = "s" > 'o' < / span><span class = "p" >,< / span> <span class = "s" > 'v' < / span><span class = "p" >]< / span> <span class = "k" > for < / span> <span class = "n" >factor< / span><span class = "p" >,< / span> <span class = "n" >group< / span> <span class = "ow" > in < / span> <span class = "n" >factor_group< / span><span class = "p" >:< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'TEST' < / span><span class = "p" >],< / span> <span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'JPERF' < / span><span class = "p" >],< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "n" >colors< / span><span class = "p" >[< / span><span class = "n" >factor< / span><span class = "p" >],< / span> <span class = "n" >marker< / span><span class = "o" > = < / span><span class = "n" >markers< / span><span class = "p" >[< / span><span class = "n" >factor< / span><span class = "p" >],< / span> <span class = "n" >s< / span><span class = "o" > = < / span><span class = "mi" > 12 < / span><span class = "o" > * * < / span><span class = "mi" > 2 < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >set_xlabel< / span><span class = "p" >(< / span><span class = "s" > 'TEST' < / span><span class = "p" >);< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >set_ylabel< / span><span class = "p" >(< / span><span class = "s" > 'JPERF' < / span><span class = "p" >);< / span> |
1 2 | <span class = "n" >min_lm< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'JPERF ~ TEST' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >minority_table< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >min_lm< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
1 2 3 4 5 6 7 8 | <span class = "n" >fig< / span><span class = "p" >,< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >subplots< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >));< / span> <span class = "k" > for < / span> <span class = "n" >factor< / span><span class = "p" >,< / span> <span class = "n" >group< / span> <span class = "ow" > in < / span> <span class = "n" >factor_group< / span><span class = "p" >:< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'TEST' < / span><span class = "p" >],< / span> <span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'JPERF' < / span><span class = "p" >],< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "n" >colors< / span><span class = "p" >[< / span><span class = "n" >factor< / span><span class = "p" >],< / span> <span class = "n" >marker< / span><span class = "o" > = < / span><span class = "n" >markers< / span><span class = "p" >[< / span><span class = "n" >factor< / span><span class = "p" >],< / span> <span class = "n" >s< / span><span class = "o" > = < / span><span class = "mi" > 12 < / span><span class = "o" > * * < / span><span class = "mi" > 2 < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >set_xlabel< / span><span class = "p" >(< / span><span class = "s" > 'TEST' < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >set_ylabel< / span><span class = "p" >(< / span><span class = "s" > 'JPERF' < / span><span class = "p" >)< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >model_results< / span> <span class = "o" > = < / span> <span class = "n" >min_lm< / span><span class = "p" >,< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >)< / span> |
1 2 3 4 | <span class = "n" >min_lm2< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'JPERF ~ TEST + TEST:ETHN' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >minority_table< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >min_lm2< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
1 2 3 4 5 6 7 8 9 10 | <span class = "n" >fig< / span><span class = "p" >,< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >subplots< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >));< / span> <span class = "k" > for < / span> <span class = "n" >factor< / span><span class = "p" >,< / span> <span class = "n" >group< / span> <span class = "ow" > in < / span> <span class = "n" >factor_group< / span><span class = "p" >:< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'TEST' < / span><span class = "p" >],< / span> <span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'JPERF' < / span><span class = "p" >],< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "n" >colors< / span><span class = "p" >[< / span><span class = "n" >factor< / span><span class = "p" >],< / span> <span class = "n" >marker< / span><span class = "o" > = < / span><span class = "n" >markers< / span><span class = "p" >[< / span><span class = "n" >factor< / span><span class = "p" >],< / span> <span class = "n" >s< / span><span class = "o" > = < / span><span class = "mi" > 12 < / span><span class = "o" > * * < / span><span class = "mi" > 2 < / span><span class = "p" >)< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >intercept< / span> <span class = "o" > = < / span> <span class = "n" >min_lm2< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'Intercept' < / span><span class = "p" >],< / span> <span class = "n" >slope< / span> <span class = "o" > = < / span> <span class = "n" >min_lm2< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'TEST' < / span><span class = "p" >],< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >,< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "s" > 'purple' < / span><span class = "p" >);< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >intercept< / span> <span class = "o" > = < / span> <span class = "n" >min_lm2< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'Intercept' < / span><span class = "p" >],< / span> <span class = "n" >slope< / span> <span class = "o" > = < / span> <span class = "n" >min_lm2< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'TEST' < / span><span class = "p" >]< / span> <span class = "o" > + < / span> <span class = "n" >min_lm2< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'TEST:ETHN' < / span><span class = "p" >],< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >,< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "s" > 'green' < / span><span class = "p" >);< / span> |
1 2 | <span class = "n" >min_lm3< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'JPERF ~ TEST + ETHN' < / span><span class = "p" >,< / span> <span class = "n" >data< / span> <span class = "o" > = < / span> <span class = "n" >minority_table< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >min_lm3< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
1 2 3 4 5 6 7 8 9 | <span class = "n" >fig< / span><span class = "p" >,< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >subplots< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >));< / span> <span class = "k" > for < / span> <span class = "n" >factor< / span><span class = "p" >,< / span> <span class = "n" >group< / span> <span class = "ow" > in < / span> <span class = "n" >factor_group< / span><span class = "p" >:< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'TEST' < / span><span class = "p" >],< / span> <span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'JPERF' < / span><span class = "p" >],< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "n" >colors< / span><span class = "p" >[< / span><span class = "n" >factor< / span><span class = "p" >],< / span> <span class = "n" >marker< / span><span class = "o" > = < / span><span class = "n" >markers< / span><span class = "p" >[< / span><span class = "n" >factor< / span><span class = "p" >],< / span> <span class = "n" >s< / span><span class = "o" > = < / span><span class = "mi" > 12 < / span><span class = "o" > * * < / span><span class = "mi" > 2 < / span><span class = "p" >)< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >intercept< / span> <span class = "o" > = < / span> <span class = "n" >min_lm3< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'Intercept' < / span><span class = "p" >],< / span> <span class = "n" >slope< / span> <span class = "o" > = < / span> <span class = "n" >min_lm3< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'TEST' < / span><span class = "p" >],< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >,< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "s" > 'purple' < / span><span class = "p" >);< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >intercept< / span> <span class = "o" > = < / span> <span class = "n" >min_lm3< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'Intercept' < / span><span class = "p" >]< / span> <span class = "o" > + < / span> <span class = "n" >min_lm3< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'ETHN' < / span><span class = "p" >],< / span> <span class = "n" >slope< / span> <span class = "o" > = < / span> <span class = "n" >min_lm3< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'TEST' < / span><span class = "p" >],< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >,< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "s" > 'green' < / span><span class = "p" >);< / span> |
1 2 | <span class = "n" >min_lm4< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'JPERF ~ TEST * ETHN' < / span><span class = "p" >,< / span> <span class = "n" >data< / span> <span class = "o" > = < / span> <span class = "n" >minority_table< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >min_lm4< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
1 2 3 4 5 6 7 8 9 10 | <span class = "n" >fig< / span><span class = "p" >,< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >subplots< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 8 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >));< / span> <span class = "k" > for < / span> <span class = "n" >factor< / span><span class = "p" >,< / span> <span class = "n" >group< / span> <span class = "ow" > in < / span> <span class = "n" >factor_group< / span><span class = "p" >:< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'TEST' < / span><span class = "p" >],< / span> <span class = "n" >group< / span><span class = "p" >[< / span><span class = "s" > 'JPERF' < / span><span class = "p" >],< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "n" >colors< / span><span class = "p" >[< / span><span class = "n" >factor< / span><span class = "p" >],< / span> <span class = "n" >marker< / span><span class = "o" > = < / span><span class = "n" >markers< / span><span class = "p" >[< / span><span class = "n" >factor< / span><span class = "p" >],< / span> <span class = "n" >s< / span><span class = "o" > = < / span><span class = "mi" > 12 < / span><span class = "o" > * * < / span><span class = "mi" > 2 < / span><span class = "p" >)< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >intercept< / span> <span class = "o" > = < / span> <span class = "n" >min_lm4< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'Intercept' < / span><span class = "p" >],< / span> <span class = "n" >slope< / span> <span class = "o" > = < / span> <span class = "n" >min_lm4< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'TEST' < / span><span class = "p" >],< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >,< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "s" > 'purple' < / span><span class = "p" >);< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >intercept< / span> <span class = "o" > = < / span> <span class = "n" >min_lm4< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'Intercept' < / span><span class = "p" >]< / span> <span class = "o" > + < / span> <span class = "n" >min_lm4< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'ETHN' < / span><span class = "p" >],< / span> <span class = "n" >slope< / span> <span class = "o" > = < / span> <span class = "n" >min_lm4< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'TEST' < / span><span class = "p" >]< / span> <span class = "o" > + < / span> <span class = "n" >min_lm4< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "s" > 'TEST:ETHN' < / span><span class = "p" >],< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >,< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "s" > 'green' < / span><span class = "p" >);< / span> |
1 2 3 | <span class = "c" > # is there any effect of ETHN on slope or intercept?</span> <span class = "n" >table5< / span> <span class = "o" > = < / span> <span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >min_lm< / span><span class = "p" >,< / span> <span class = "n" >min_lm4< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >table5< / span><span class = "p" >)< / span> |
1 2 3 | <span class = "c" > # is there any effect of ETHN on intercept</span> <span class = "n" >table6< / span> <span class = "o" > = < / span> <span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >min_lm< / span><span class = "p" >,< / span> <span class = "n" >min_lm3< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >table6< / span><span class = "p" >)< / span> |
1 2 3 | <span class = "c" > # is there any effect of ETHN on slope</span> <span class = "n" >table7< / span> <span class = "o" > = < / span> <span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >min_lm< / span><span class = "p" >,< / span> <span class = "n" >min_lm2< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >table7< / span><span class = "p" >)< / span> |
1 2 3 | <span class = "c" > # is it just the slope or both?</span> <span class = "n" >table8< / span> <span class = "o" > = < / span> <span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >min_lm2< / span><span class = "p" >,< / span> <span class = "n" >min_lm4< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >table8< / span><span class = "p" >)< / span> |
One-way ANOVA
1 2 3 4 5 6 7 8 9 | <span class = "k" > try < / span><span class = "p" >:< / span> <span class = "n" >rehab_table< / span> <span class = "o" > = < / span> <span class = "n" >pd< / span><span class = "o" >.< / span><span class = "n" >read_csv< / span><span class = "p" >(< / span><span class = "s" > 'rehab.table' < / span><span class = "p" >)< / span> <span class = "k" > except < / span><span class = "p" >:< / span> <span class = "n" >url< / span> <span class = "o" > = < / span> <span class = "s" > 'http://stats191.stanford.edu/data/rehab.csv' < / span> <span class = "n" >rehab_table< / span> <span class = "o" > = < / span> <span class = "n" >pd< / span><span class = "o" >.< / span><span class = "n" >read_table< / span><span class = "p" >(< / span><span class = "n" >url< / span><span class = "p" >,< / span> <span class = "n" >delimiter< / span><span class = "o" > = < / span><span class = "s" > "," < / span><span class = "p" >)< / span> <span class = "n" >rehab_table< / span><span class = "o" >.< / span><span class = "n" >to_csv< / span><span class = "p" >(< / span><span class = "s" > 'rehab.table' < / span><span class = "p" >)< / span> <span class = "n" >fig< / span><span class = "p" >,< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >subplots< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 8 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >))< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >rehab_table< / span><span class = "o" >.< / span><span class = "n" >boxplot< / span><span class = "p" >(< / span><span class = "s" > 'Time' < / span><span class = "p" >,< / span> <span class = "s" > 'Fitness' < / span><span class = "p" >,< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >,< / span> <span class = "n" >grid< / span><span class = "o" > = < / span><span class = "bp" > False < / span><span class = "p" >)< / span> |
1 2 3 4 5 | <span class = "n" >rehab_lm< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'Time ~ C(Fitness)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >rehab_table< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "n" >table9< / span> <span class = "o" > = < / span> <span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >rehab_lm< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >table9< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >rehab_lm< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >orig_exog< / span><span class = "p" >)< / span> |
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >rehab_lm< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
Two-way ANOVA
1 2 3 4 5 | <span class = "k" > try < / span><span class = "p" >:< / span> <span class = "n" >kidney_table< / span> <span class = "o" > = < / span> <span class = "n" >pd< / span><span class = "o" >.< / span><span class = "n" >read_table< / span><span class = "p" >(< / span><span class = "s" > './kidney.table' < / span><span class = "p" >)< / span> <span class = "k" > except < / span><span class = "p" >:< / span> <span class = "n" >url< / span> <span class = "o" > = < / span> <span class = "s" > 'http://stats191.stanford.edu/data/kidney.table' < / span> <span class = "n" >kidney_table< / span> <span class = "o" > = < / span> <span class = "n" >pd< / span><span class = "o" >.< / span><span class = "n" >read_table< / span><span class = "p" >(< / span><span class = "n" >url< / span><span class = "p" >,< / span> <span class = "n" >delimiter< / span><span class = "o" > = < / span><span class = "s" > " *" < / span><span class = "p" >)< / span> |
Explore the dataset
1 | <span class = "n" >kidney_table< / span><span class = "o" >.< / span><span class = "n" >groupby< / span><span class = "p" >([< / span><span class = "s" > 'Weight' < / span><span class = "p" >,< / span> <span class = "s" > 'Duration' < / span><span class = "p" >])< / span><span class = "o" >.< / span><span class = "n" >size< / span><span class = "p" >()< / span> |
Balanced panel
1 2 3 4 | <span class = "n" >kt< / span> <span class = "o" > = < / span> <span class = "n" >kidney_table< / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 8 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >))< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >interaction_plot< / span><span class = "p" >(< / span><span class = "n" >kt< / span><span class = "p" >[< / span><span class = "s" > 'Weight' < / span><span class = "p" >],< / span> <span class = "n" >kt< / span><span class = "p" >[< / span><span class = "s" > 'Duration' < / span><span class = "p" >],< / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >log< / span><span class = "p" >(< / span><span class = "n" >kt< / span><span class = "p" >[< / span><span class = "s" > 'Days' < / span><span class = "p" >]< / span><span class = "o" > + < / span><span class = "mi" > 1 < / span><span class = "p" >),< / span> <span class = "n" >colors< / span><span class = "o" > = < / span><span class = "p" >[< / span><span class = "s" > 'red' < / span><span class = "p" >,< / span> <span class = "s" > 'blue' < / span><span class = "p" >],< / span> <span class = "n" >markers< / span><span class = "o" > = < / span><span class = "p" >[< / span><span class = "s" > 'D' < / span><span class = "p" >,< / span><span class = "s" > '^' < / span><span class = "p" >],< / span> <span class = "n" >ms< / span><span class = "o" > = < / span><span class = "mi" > 10 < / span><span class = "p" >,< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >gca< / span><span class = "p" >())< / span> |
You have things available in the calling namespace available in the formula evaluation namespace
1 2 3 4 5 6 7 8 9 10 11 12 | <span class = "n" >kidney_lm< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'np.log(Days+1) ~ C(Duration) * C(Weight)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >kt< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "n" >table10< / span> <span class = "o" > = < / span> <span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >kidney_lm< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'np.log(Days+1) ~ C(Duration) + C(Weight)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >kt< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >(),< / span> <span class = "n" >kidney_lm< / span><span class = "p" >))< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'np.log(Days+1) ~ C(Duration)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >kt< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >(),< / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'np.log(Days+1) ~ C(Duration) + C(Weight, Sum)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >kt< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()))< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'np.log(Days+1) ~ C(Weight)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >kt< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >(),< / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'np.log(Days+1) ~ C(Duration) + C(Weight, Sum)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >kt< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()))< / span> |
Sum of squares
Illustrates the use of different types of sums of squares (I,II,II) and how the Sum contrast can be used to produce the same output between the 3.
Types I and II are equivalent under a balanced design.
Don't use Type III with non-orthogonal contrast - ie., Treatment
1 2 3 4 5 6 | <span class = "n" >sum_lm< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'np.log(Days+1) ~ C(Duration, Sum) * C(Weight, Sum)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >kt< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >sum_lm< / span><span class = "p" >))< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >sum_lm< / span><span class = "p" >,< / span> <span class = "n" >typ< / span><span class = "o" > = < / span><span class = "mi" > 2 < / span><span class = "p" >))< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >sum_lm< / span><span class = "p" >,< / span> <span class = "n" >typ< / span><span class = "o" > = < / span><span class = "mi" > 3 < / span><span class = "p" >))< / span> |
1 2 3 4 5 | <span class = "n" >nosum_lm< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'np.log(Days+1) ~ C(Duration, Treatment) * C(Weight, Treatment)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >kt< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >nosum_lm< / span><span class = "p" >))< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >nosum_lm< / span><span class = "p" >,< / span> <span class = "n" >typ< / span><span class = "o" > = < / span><span class = "mi" > 2 < / span><span class = "p" >))< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >anova_lm< / span><span class = "p" >(< / span><span class = "n" >nosum_lm< / span><span class = "p" >,< / span> <span class = "n" >typ< / span><span class = "o" > = < / span><span class = "mi" > 3 < / span><span class = "p" >))< / span> |
Please login to continue.