Regression diagnostics
This example file shows how to use a few of the statsmodels
regression diagnostic tests in a real-life context. You can learn about more tests and find out more information abou the tests here on the Regression Diagnostics page.
Note that most of the tests described here only return a tuple of numbers, without any annotation. A full description of outputs is always included in the docstring and in the online statsmodels
documentation. For presentation purposes, we use the zip(name,test)
construct to pretty-print short descriptions in the examples below.
Estimate a regression model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | <span class = "kn" > from < / span> <span class = "nn" >__future__< / span> <span class = "kn" > import < / span> <span class = "n" >print_function< / span> <span class = "kn" > from < / span> <span class = "nn" >statsmodels.compat< / span> <span class = "kn" > import < / span> <span class = "n" >lzip< / span> <span class = "kn" > import < / span> <span class = "nn" >statsmodels< / span> <span class = "kn" > import < / span> <span class = "nn" >numpy< / span> <span class = "kn" >as< / span> <span class = "nn" >np< / span> <span class = "kn" > import < / span> <span class = "nn" >pandas< / span> <span class = "kn" >as< / span> <span class = "nn" >pd< / span> <span class = "kn" > import < / span> <span class = "nn" >statsmodels.formula.api< / span> <span class = "kn" >as< / span> <span class = "nn" >smf< / span> <span class = "kn" > import < / span> <span class = "nn" >statsmodels.stats.api< / span> <span class = "kn" >as< / span> <span class = "nn" >sms< / span> <span class = "kn" > import < / span> <span class = "nn" >matplotlib.pyplot< / span> <span class = "kn" >as< / span> <span class = "nn" >plt< / span> <span class = "c" > # Load data</span> <span class = "n" >url< / span> <span class = "o" > = < / span> <span class = "s" > 'http://vincentarelbundock.github.io/Rdatasets/csv/HistData/Guerry.csv' < / span> <span class = "n" >dat< / span> <span class = "o" > = < / span> <span class = "n" >pd< / span><span class = "o" >.< / span><span class = "n" >read_csv< / span><span class = "p" >(< / span><span class = "n" >url< / span><span class = "p" >)< / span> <span class = "c" > # Fit regression model (using the natural log of one of the regressaors)</span> <span class = "n" >results< / span> <span class = "o" > = < / span> <span class = "n" >smf< / span><span class = "o" >.< / span><span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'Lottery ~ Literacy + np.log(Pop1831)' < / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" > = < / span><span class = "n" >dat< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "c" > # Inspect the results</span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >results< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
Normality of the residuals
Jarque-Bera test:
1 2 3 | <span class = "n" >name< / span> <span class = "o" > = < / span> <span class = "p" >[< / span><span class = "s" > 'Jarque-Bera' < / span><span class = "p" >,< / span> <span class = "s" > 'Chi^2 two-tail prob.' < / span><span class = "p" >,< / span> <span class = "s" > 'Skew' < / span><span class = "p" >,< / span> <span class = "s" > 'Kurtosis' < / span><span class = "p" >]< / span> <span class = "n" >test< / span> <span class = "o" > = < / span> <span class = "n" >sms< / span><span class = "o" >.< / span><span class = "n" >jarque_bera< / span><span class = "p" >(< / span><span class = "n" >results< / span><span class = "o" >.< / span><span class = "n" >resid< / span><span class = "p" >)< / span> <span class = "n" >lzip< / span><span class = "p" >(< / span><span class = "n" >name< / span><span class = "p" >,< / span> <span class = "n" >test< / span><span class = "p" >)< / span> |
Omni test:
1 2 3 | <span class = "n" >name< / span> <span class = "o" > = < / span> <span class = "p" >[< / span><span class = "s" > 'Chi^2' < / span><span class = "p" >,< / span> <span class = "s" > 'Two-tail probability' < / span><span class = "p" >]< / span> <span class = "n" >test< / span> <span class = "o" > = < / span> <span class = "n" >sms< / span><span class = "o" >.< / span><span class = "n" >omni_normtest< / span><span class = "p" >(< / span><span class = "n" >results< / span><span class = "o" >.< / span><span class = "n" >resid< / span><span class = "p" >)< / span> <span class = "n" >lzip< / span><span class = "p" >(< / span><span class = "n" >name< / span><span class = "p" >,< / span> <span class = "n" >test< / span><span class = "p" >)< / span> |
Influence tests
Once created, an object of class OLSInfluence
holds attributes and methods that allow users to assess the influence of each observation. For example, we can compute and extract the first few rows of DFbetas by:
1 2 3 | <span class = "kn" > from < / span> <span class = "nn" >statsmodels.stats.outliers_influence< / span> <span class = "kn" > import < / span> <span class = "n" >OLSInfluence< / span> <span class = "n" >test_class< / span> <span class = "o" > = < / span> <span class = "n" >OLSInfluence< / span><span class = "p" >(< / span><span class = "n" >results< / span><span class = "p" >)< / span> <span class = "n" >test_class< / span><span class = "o" >.< / span><span class = "n" >dfbetas< / span><span class = "p" >[:< / span><span class = "mi" > 5 < / span><span class = "p" >,:]< / span> |
Explore other options by typing dir(influence_test)
Useful information on leverage can also be plotted:
1 2 3 | <span class = "kn" > from < / span> <span class = "nn" >statsmodels.graphics.regressionplots< / span> <span class = "kn" > import < / span> <span class = "n" >plot_leverage_resid2< / span> <span class = "n" >fig< / span><span class = "p" >,< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >subplots< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 8 < / span><span class = "p" >,< / span><span class = "mi" > 6 < / span><span class = "p" >))< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >plot_leverage_resid2< / span><span class = "p" >(< / span><span class = "n" >results< / span><span class = "p" >,< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >ax< / span><span class = "p" >)< / span> |
Other plotting options can be found on the Graphics page.
Multicollinearity
Condition number:
1 | <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >linalg< / span><span class = "o" >.< / span><span class = "n" >cond< / span><span class = "p" >(< / span><span class = "n" >results< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >exog< / span><span class = "p" >)< / span> |
Heteroskedasticity tests
Breush-Pagan test:
1 2 3 4 | <span class = "n" >name< / span> <span class = "o" > = < / span> <span class = "p" >[< / span><span class = "s" > 'Lagrange multiplier statistic' < / span><span class = "p" >,< / span> <span class = "s" > 'p-value' < / span><span class = "p" >,< / span> <span class = "s" > 'f-value' < / span><span class = "p" >,< / span> <span class = "s" > 'f p-value' < / span><span class = "p" >]< / span> <span class = "n" >test< / span> <span class = "o" > = < / span> <span class = "n" >sms< / span><span class = "o" >.< / span><span class = "n" >het_breushpagan< / span><span class = "p" >(< / span><span class = "n" >results< / span><span class = "o" >.< / span><span class = "n" >resid< / span><span class = "p" >,< / span> <span class = "n" >results< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >exog< / span><span class = "p" >)< / span> <span class = "n" >lzip< / span><span class = "p" >(< / span><span class = "n" >name< / span><span class = "p" >,< / span> <span class = "n" >test< / span><span class = "p" >)< / span> |
Goldfeld-Quandt test
1 2 3 | <span class = "n" >name< / span> <span class = "o" > = < / span> <span class = "p" >[< / span><span class = "s" > 'F statistic' < / span><span class = "p" >,< / span> <span class = "s" > 'p-value' < / span><span class = "p" >]< / span> <span class = "n" >test< / span> <span class = "o" > = < / span> <span class = "n" >sms< / span><span class = "o" >.< / span><span class = "n" >het_goldfeldquandt< / span><span class = "p" >(< / span><span class = "n" >results< / span><span class = "o" >.< / span><span class = "n" >resid< / span><span class = "p" >,< / span> <span class = "n" >results< / span><span class = "o" >.< / span><span class = "n" >model< / span><span class = "o" >.< / span><span class = "n" >exog< / span><span class = "p" >)< / span> <span class = "n" >lzip< / span><span class = "p" >(< / span><span class = "n" >name< / span><span class = "p" >,< / span> <span class = "n" >test< / span><span class = "p" >)< / span> |
Linearity
Harvey-Collier multiplier test for Null hypothesis that the linear specification is correct:
1 2 3 | <span class = "n" >name< / span> <span class = "o" > = < / span> <span class = "p" >[< / span><span class = "s" > 't value' < / span><span class = "p" >,< / span> <span class = "s" > 'p value' < / span><span class = "p" >]< / span> <span class = "n" >test< / span> <span class = "o" > = < / span> <span class = "n" >sms< / span><span class = "o" >.< / span><span class = "n" >linear_harvey_collier< / span><span class = "p" >(< / span><span class = "n" >results< / span><span class = "p" >)< / span> <span class = "n" >lzip< / span><span class = "p" >(< / span><span class = "n" >name< / span><span class = "p" >,< / span> <span class = "n" >test< / span><span class = "p" >)< / span> |
Please login to continue.