M-Estimators for Robust Linear Modeling
In [1]:
1 2 3 4 5 6 7 | <span class = "kn" > from < / span> <span class = "nn" >__future__< / span> <span class = "kn" > import < / span> <span class = "n" >print_function< / span> <span class = "kn" > from < / span> <span class = "nn" >statsmodels.compat< / span> <span class = "kn" > import < / span> <span class = "n" >lmap< / span> <span class = "kn" > import < / span> <span class = "nn" >numpy< / span> <span class = "kn" >as< / span> <span class = "nn" >np< / span> <span class = "kn" > from < / span> <span class = "nn" >scipy< / span> <span class = "kn" > import < / span> <span class = "n" >stats< / span> <span class = "kn" > import < / span> <span class = "nn" >matplotlib.pyplot< / span> <span class = "kn" >as< / span> <span class = "nn" >plt< / span> <span class = "kn" > import < / span> <span class = "nn" >statsmodels.api< / span> <span class = "kn" >as< / span> <span class = "nn" >sm< / span> |
- An M-estimator minimizes the function
Q(ei,ρ)=∑i ρ(eis)
where $\rho$ is a symmetric function of the residuals
- The effect of $\rho$ is to reduce the influence of outliers
- $s$ is an estimate of scale.
- The robust estimates $\hat{\beta}$ are computed by the iteratively re-weighted least squares algorithm
- We have several choices available for the weighting functions to be used
In [2]:
1 | <span class = "n" >norms< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >robust< / span><span class = "o" >.< / span><span class = "n" >norms< / span> |
In [3]:
1 2 3 4 5 6 7 8 | <span class = "k" > def < / span> <span class = "nf" >plot_weights< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >weights_func< / span><span class = "p" >,< / span> <span class = "n" >xlabels< / span><span class = "p" >,< / span> <span class = "n" >xticks< / span><span class = "p" >):< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 12 < / span><span class = "p" >,< / span><span class = "mi" > 8 < / span><span class = "p" >))< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >fig< / span><span class = "o" >.< / span><span class = "n" >add_subplot< / span><span class = "p" >(< / span><span class = "mi" > 111 < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >plot< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >weights_func< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >))< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >set_xticks< / span><span class = "p" >(< / span><span class = "n" >xticks< / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >set_xticklabels< / span><span class = "p" >(< / span><span class = "n" >xlabels< / span><span class = "p" >,< / span> <span class = "n" >fontsize< / span><span class = "o" > = < / span><span class = "mi" > 16 < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >set_ylim< / span><span class = "p" >(< / span><span class = "o" > - .< / span><span class = "mi" > 1 < / span><span class = "p" >,< / span> <span class = "mf" > 1.1 < / span><span class = "p" >)< / span> <span class = "k" > return < / span> <span class = "n" >ax< / span> |
Andrew's Wave
In [4]:
1 | <span class = "n" > help < / span><span class = "p" >(< / span><span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >AndrewWave< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >)< / span> |
In [5]:
1 2 3 4 | <span class = "n" >a< / span> <span class = "o" > = < / span> <span class = "mf" > 1.339 < / span> <span class = "n" >support< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >linspace< / span><span class = "p" >(< / span><span class = "o" > - < / span><span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >pi< / span><span class = "o" > * < / span><span class = "n" >a< / span><span class = "p" >,< / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >pi< / span><span class = "o" > * < / span><span class = "n" >a< / span><span class = "p" >,< / span> <span class = "mi" > 100 < / span><span class = "p" >)< / span> <span class = "n" >andrew< / span> <span class = "o" > = < / span> <span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >AndrewWave< / span><span class = "p" >(< / span><span class = "n" >a< / span><span class = "o" > = < / span><span class = "n" >a< / span><span class = "p" >)< / span> <span class = "n" >plot_weights< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >andrew< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >,< / span> <span class = "p" >[< / span><span class = "s" > '$-\pi*a$' < / span><span class = "p" >,< / span> <span class = "s" > '0' < / span><span class = "p" >,< / span> <span class = "s" > '$\pi*a$' < / span><span class = "p" >],< / span> <span class = "p" >[< / span><span class = "o" > - < / span><span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >pi< / span><span class = "o" > * < / span><span class = "n" >a< / span><span class = "p" >,< / span> <span class = "mi" > 0 < / span><span class = "p" >,< / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >pi< / span><span class = "o" > * < / span><span class = "n" >a< / span><span class = "p" >]);< / span> |
Hampel's 17A
In [6]:
1 | <span class = "n" > help < / span><span class = "p" >(< / span><span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >Hampel< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >)< / span> |
In [7]:
1 2 3 4 | <span class = "n" >c< / span> <span class = "o" > = < / span> <span class = "mi" > 8 < / span> <span class = "n" >support< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >linspace< / span><span class = "p" >(< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >,< / span> <span class = "mi" > 1000 < / span><span class = "p" >)< / span> <span class = "n" >hampel< / span> <span class = "o" > = < / span> <span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >Hampel< / span><span class = "p" >(< / span><span class = "n" >a< / span><span class = "o" > = < / span><span class = "mf" > 2. < / span><span class = "p" >,< / span> <span class = "n" >b< / span><span class = "o" > = < / span><span class = "mf" > 4. < / span><span class = "p" >,< / span> <span class = "n" >c< / span><span class = "o" > = < / span><span class = "n" >c< / span><span class = "p" >)< / span> <span class = "n" >plot_weights< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >hampel< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >,< / span> <span class = "p" >[< / span><span class = "s" > '3*c' < / span><span class = "p" >,< / span> <span class = "s" > '0' < / span><span class = "p" >,< / span> <span class = "s" > '3*c' < / span><span class = "p" >],< / span> <span class = "p" >[< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >,< / span> <span class = "mi" > 0 < / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >]);< / span> |
Huber's t
In [8]:
1 | <span class = "n" > help < / span><span class = "p" >(< / span><span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >HuberT< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >)< / span> |
In [9]:
1 2 3 4 | <span class = "n" >t< / span> <span class = "o" > = < / span> <span class = "mf" > 1.345 < / span> <span class = "n" >support< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >linspace< / span><span class = "p" >(< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >t< / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >t< / span><span class = "p" >,< / span> <span class = "mi" > 1000 < / span><span class = "p" >)< / span> <span class = "n" >huber< / span> <span class = "o" > = < / span> <span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >HuberT< / span><span class = "p" >(< / span><span class = "n" >t< / span><span class = "o" > = < / span><span class = "n" >t< / span><span class = "p" >)< / span> <span class = "n" >plot_weights< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >huber< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >,< / span> <span class = "p" >[< / span><span class = "s" > '-3*t' < / span><span class = "p" >,< / span> <span class = "s" > '0' < / span><span class = "p" >,< / span> <span class = "s" > '3*t' < / span><span class = "p" >],< / span> <span class = "p" >[< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >t< / span><span class = "p" >,< / span> <span class = "mi" > 0 < / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >t< / span><span class = "p" >]);< / span> |
Least Squares
In [10]:
1 | <span class = "n" > help < / span><span class = "p" >(< / span><span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >LeastSquares< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >)< / span> |
In [11]:
1 2 3 | <span class = "n" >support< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >linspace< / span><span class = "p" >(< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "p" >,< / span> <span class = "mi" > 1000 < / span><span class = "p" >)< / span> <span class = "n" >lst_sq< / span> <span class = "o" > = < / span> <span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >LeastSquares< / span><span class = "p" >()< / span> <span class = "n" >plot_weights< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >lst_sq< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >,< / span> <span class = "p" >[< / span><span class = "s" > '-3' < / span><span class = "p" >,< / span> <span class = "s" > '0' < / span><span class = "p" >,< / span> <span class = "s" > '3' < / span><span class = "p" >],< / span> <span class = "p" >[< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "p" >,< / span> <span class = "mi" > 0 < / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "p" >]);< / span> |
Ramsay's Ea
In [12]:
1 | <span class = "n" > help < / span><span class = "p" >(< / span><span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >RamsayE< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >)< / span> |
In [13]:
1 2 3 4 | <span class = "n" >a< / span> <span class = "o" > = < / span> <span class = "o" >.< / span><span class = "mi" > 3 < / span> <span class = "n" >support< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >linspace< / span><span class = "p" >(< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >a< / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >a< / span><span class = "p" >,< / span> <span class = "mi" > 1000 < / span><span class = "p" >)< / span> <span class = "n" >ramsay< / span> <span class = "o" > = < / span> <span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >RamsayE< / span><span class = "p" >(< / span><span class = "n" >a< / span><span class = "o" > = < / span><span class = "n" >a< / span><span class = "p" >)< / span> <span class = "n" >plot_weights< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >ramsay< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >,< / span> <span class = "p" >[< / span><span class = "s" > '-3*a' < / span><span class = "p" >,< / span> <span class = "s" > '0' < / span><span class = "p" >,< / span> <span class = "s" > '3*a' < / span><span class = "p" >],< / span> <span class = "p" >[< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >a< / span><span class = "p" >,< / span> <span class = "mi" > 0 < / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >a< / span><span class = "p" >]);< / span> |
Trimmed Mean
In [14]:
1 | <span class = "n" > help < / span><span class = "p" >(< / span><span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >TrimmedMean< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >)< / span> |
In [15]:
1 2 3 4 | <span class = "n" >c< / span> <span class = "o" > = < / span> <span class = "mi" > 2 < / span> <span class = "n" >support< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >linspace< / span><span class = "p" >(< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >,< / span> <span class = "mi" > 1000 < / span><span class = "p" >)< / span> <span class = "n" >trimmed< / span> <span class = "o" > = < / span> <span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >TrimmedMean< / span><span class = "p" >(< / span><span class = "n" >c< / span><span class = "o" > = < / span><span class = "n" >c< / span><span class = "p" >)< / span> <span class = "n" >plot_weights< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >trimmed< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >,< / span> <span class = "p" >[< / span><span class = "s" > '-3*c' < / span><span class = "p" >,< / span> <span class = "s" > '0' < / span><span class = "p" >,< / span> <span class = "s" > '3*c' < / span><span class = "p" >],< / span> <span class = "p" >[< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >,< / span> <span class = "mi" > 0 < / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >]);< / span> |
Tukey's Biweight
In [16]:
1 | <span class = "n" > help < / span><span class = "p" >(< / span><span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >TukeyBiweight< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >)< / span> |
In [17]:
1 2 3 4 | <span class = "n" >c< / span> <span class = "o" > = < / span> <span class = "mf" > 4.685 < / span> <span class = "n" >support< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >linspace< / span><span class = "p" >(< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >,< / span> <span class = "mi" > 1000 < / span><span class = "p" >)< / span> <span class = "n" >tukey< / span> <span class = "o" > = < / span> <span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >TukeyBiweight< / span><span class = "p" >(< / span><span class = "n" >c< / span><span class = "o" > = < / span><span class = "n" >c< / span><span class = "p" >)< / span> <span class = "n" >plot_weights< / span><span class = "p" >(< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >tukey< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >,< / span> <span class = "p" >[< / span><span class = "s" > '-3*c' < / span><span class = "p" >,< / span> <span class = "s" > '0' < / span><span class = "p" >,< / span> <span class = "s" > '3*c' < / span><span class = "p" >],< / span> <span class = "p" >[< / span><span class = "o" > - < / span><span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >,< / span> <span class = "mi" > 0 < / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "o" > * < / span><span class = "n" >c< / span><span class = "p" >]);< / span> |
Scale Estimators
- Robust estimates of the location
In [18]:
1 | <span class = "n" >x< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >array< / span><span class = "p" >([< / span><span class = "mi" > 1 < / span><span class = "p" >,< / span> <span class = "mi" > 2 < / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "p" >,< / span> <span class = "mi" > 4 < / span><span class = "p" >,< / span> <span class = "mi" > 500 < / span><span class = "p" >])< / span> |
- The mean is not a robust estimator of location
In [19]:
1 | <span class = "n" >x< / span><span class = "o" >.< / span><span class = "n" >mean< / span><span class = "p" >()< / span> |
Out[19]:
- The median, on the other hand, is a robust estimator with a breakdown point of 50%
In [20]:
1 | <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >median< / span><span class = "p" >(< / span><span class = "n" >x< / span><span class = "p" >)< / span> |
Out[20]:
- Analagously for the scale
- The standard deviation is not robust
In [21]:
1 | <span class = "n" >x< / span><span class = "o" >.< / span><span class = "n" >std< / span><span class = "p" >()< / span> |
Out[21]:
Median Absolute Deviation
mediani|Xi−medianj(Xj)|)
Standardized Median Absolute Deviation is a consistent estimator for $\hat{\sigma}$
ˆσ=K⋅MAD
where $K$ depends on the distribution. For the normal distribution for example,
K=Φ−1(.75)
In [22]:
1 | <span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >norm< / span><span class = "o" >.< / span><span class = "n" >ppf< / span><span class = "p" >(< / span><span class = "o" >.< / span><span class = "mi" > 75 < / span><span class = "p" >)< / span> |
Out[22]:
In [23]:
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >x< / span><span class = "p" >)< / span> |
In [24]:
1 | <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >robust< / span><span class = "o" >.< / span><span class = "n" >scale< / span><span class = "o" >.< / span><span class = "n" >stand_mad< / span><span class = "p" >(< / span><span class = "n" >x< / span><span class = "p" >)< / span> |
Out[24]:
In [25]:
1 | <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >array< / span><span class = "p" >([< / span><span class = "mi" > 1 < / span><span class = "p" >,< / span><span class = "mi" > 2 < / span><span class = "p" >,< / span><span class = "mi" > 3 < / span><span class = "p" >,< / span><span class = "mi" > 4 < / span><span class = "p" >,< / span><span class = "mf" > 5. < / span><span class = "p" >])< / span><span class = "o" >.< / span><span class = "n" >std< / span><span class = "p" >()< / span> |
Out[25]:
- The default for Robust Linear Models is MAD
- another popular choice is Huber's proposal 2
In [26]:
1 2 | <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >random< / span><span class = "o" >.< / span><span class = "n" >seed< / span><span class = "p" >(< / span><span class = "mi" > 12345 < / span><span class = "p" >)< / span> <span class = "n" >fat_tails< / span> <span class = "o" > = < / span> <span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >t< / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >rvs< / span><span class = "p" >(< / span><span class = "mi" > 40 < / span><span class = "p" >)< / span> |
In [27]:
1 2 3 4 5 | <span class = "n" >kde< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >nonparametric< / span><span class = "o" >.< / span><span class = "n" >KDE< / span><span class = "p" >(< / span><span class = "n" >fat_tails< / span><span class = "p" >)< / span> <span class = "n" >kde< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 12 < / span><span class = "p" >,< / span><span class = "mi" > 8 < / span><span class = "p" >))< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >fig< / span><span class = "o" >.< / span><span class = "n" >add_subplot< / span><span class = "p" >(< / span><span class = "mi" > 111 < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >plot< / span><span class = "p" >(< / span><span class = "n" >kde< / span><span class = "o" >.< / span><span class = "n" >support< / span><span class = "p" >,< / span> <span class = "n" >kde< / span><span class = "o" >.< / span><span class = "n" >density< / span><span class = "p" >);< / span> |
In [28]:
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >fat_tails< / span><span class = "o" >.< / span><span class = "n" >mean< / span><span class = "p" >(),< / span> <span class = "n" >fat_tails< / span><span class = "o" >.< / span><span class = "n" >std< / span><span class = "p" >())< / span> |
In [29]:
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >norm< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >(< / span><span class = "n" >fat_tails< / span><span class = "p" >))< / span> |
In [30]:
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >t< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >(< / span><span class = "n" >fat_tails< / span><span class = "p" >,< / span> <span class = "n" >f0< / span><span class = "o" > = < / span><span class = "mi" > 6 < / span><span class = "p" >))< / span> |
In [31]:
1 2 3 | <span class = "n" >huber< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >robust< / span><span class = "o" >.< / span><span class = "n" >scale< / span><span class = "o" >.< / span><span class = "n" >Huber< / span><span class = "p" >()< / span> <span class = "n" >loc< / span><span class = "p" >,< / span> <span class = "n" >scale< / span> <span class = "o" > = < / span> <span class = "n" >huber< / span><span class = "p" >(< / span><span class = "n" >fat_tails< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >loc< / span><span class = "p" >,< / span> <span class = "n" >scale< / span><span class = "p" >)< / span> |
In [32]:
1 | <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >robust< / span><span class = "o" >.< / span><span class = "n" >stand_mad< / span><span class = "p" >(< / span><span class = "n" >fat_tails< / span><span class = "p" >)< / span> |
Out[32]:
In [33]:
1 | <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >robust< / span><span class = "o" >.< / span><span class = "n" >stand_mad< / span><span class = "p" >(< / span><span class = "n" >fat_tails< / span><span class = "p" >,< / span> <span class = "n" >c< / span><span class = "o" > = < / span><span class = "n" >stats< / span><span class = "o" >.< / span><span class = "n" >t< / span><span class = "p" >(< / span><span class = "mi" > 6 < / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >ppf< / span><span class = "p" >(< / span><span class = "o" >.< / span><span class = "mi" > 75 < / span><span class = "p" >))< / span> |
Out[33]:
In [34]:
1 | <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >robust< / span><span class = "o" >.< / span><span class = "n" >scale< / span><span class = "o" >.< / span><span class = "n" >mad< / span><span class = "p" >(< / span><span class = "n" >fat_tails< / span><span class = "p" >)< / span> |
Out[34]:
Duncan's Occupational Prestige data - M-estimation for outliers
In [35]:
1 2 | <span class = "kn" > from < / span> <span class = "nn" >statsmodels.graphics.api< / span> <span class = "kn" > import < / span> <span class = "n" >abline_plot< / span> <span class = "kn" > from < / span> <span class = "nn" >statsmodels.formula.api< / span> <span class = "kn" > import < / span> <span class = "n" >ols< / span><span class = "p" >,< / span> <span class = "n" >rlm< / span> |
In [36]:
1 | <span class = "n" >prestige< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >datasets< / span><span class = "o" >.< / span><span class = "n" >get_rdataset< / span><span class = "p" >(< / span><span class = "s" > "Duncan" < / span><span class = "p" >,< / span> <span class = "s" > "car" < / span><span class = "p" >,< / span> <span class = "n" >cache< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >data< / span> |
In [37]:
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >prestige< / span><span class = "o" >.< / span><span class = "n" >head< / span><span class = "p" >(< / span><span class = "mi" > 10 < / span><span class = "p" >))< / span> |
In [38]:
1 2 3 4 5 6 7 8 | <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 12 < / span><span class = "p" >,< / span><span class = "mi" > 12 < / span><span class = "p" >))< / span> <span class = "n" >ax1< / span> <span class = "o" > = < / span> <span class = "n" >fig< / span><span class = "o" >.< / span><span class = "n" >add_subplot< / span><span class = "p" >(< / span><span class = "mi" > 211 < / span><span class = "p" >,< / span> <span class = "n" >xlabel< / span><span class = "o" > = < / span><span class = "s" > 'Income' < / span><span class = "p" >,< / span> <span class = "n" >ylabel< / span><span class = "o" > = < / span><span class = "s" > 'Prestige' < / span><span class = "p" >)< / span> <span class = "n" >ax1< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >prestige< / span><span class = "o" >.< / span><span class = "n" >income< / span><span class = "p" >,< / span> <span class = "n" >prestige< / span><span class = "o" >.< / span><span class = "n" >prestige< / span><span class = "p" >)< / span> <span class = "n" >xy_outlier< / span> <span class = "o" > = < / span> <span class = "n" >prestige< / span><span class = "o" >.< / span><span class = "n" >ix< / span><span class = "p" >[< / span><span class = "s" > 'minister' < / span><span class = "p" >][[< / span><span class = "s" > 'income' < / span><span class = "p" >,< / span><span class = "s" > 'prestige' < / span><span class = "p" >]]< / span> <span class = "n" >ax1< / span><span class = "o" >.< / span><span class = "n" >annotate< / span><span class = "p" >(< / span><span class = "s" > 'Minister' < / span><span class = "p" >,< / span> <span class = "n" >xy_outlier< / span><span class = "p" >,< / span> <span class = "n" >xy_outlier< / span><span class = "o" > + < / span><span class = "mi" > 1 < / span><span class = "p" >,< / span> <span class = "n" >fontsize< / span><span class = "o" > = < / span><span class = "mi" > 16 < / span><span class = "p" >)< / span> <span class = "n" >ax2< / span> <span class = "o" > = < / span> <span class = "n" >fig< / span><span class = "o" >.< / span><span class = "n" >add_subplot< / span><span class = "p" >(< / span><span class = "mi" > 212 < / span><span class = "p" >,< / span> <span class = "n" >xlabel< / span><span class = "o" > = < / span><span class = "s" > 'Education' < / span><span class = "p" >,< / span> <span class = "n" >ylabel< / span><span class = "o" > = < / span><span class = "s" > 'Prestige' < / span><span class = "p" >)< / span> <span class = "n" >ax2< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "n" >prestige< / span><span class = "o" >.< / span><span class = "n" >education< / span><span class = "p" >,< / span> <span class = "n" >prestige< / span><span class = "o" >.< / span><span class = "n" >prestige< / span><span class = "p" >);< / span> |
In [39]:
1 2 | <span class = "n" >ols_model< / span> <span class = "o" > = < / span> <span class = "n" >ols< / span><span class = "p" >(< / span><span class = "s" > 'prestige ~ income + education' < / span><span class = "p" >,< / span> <span class = "n" >prestige< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >ols_model< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
In [40]:
1 2 3 | <span class = "n" >infl< / span> <span class = "o" > = < / span> <span class = "n" >ols_model< / span><span class = "o" >.< / span><span class = "n" >get_influence< / span><span class = "p" >()< / span> <span class = "n" >student< / span> <span class = "o" > = < / span> <span class = "n" >infl< / span><span class = "o" >.< / span><span class = "n" >summary_frame< / span><span class = "p" >()[< / span><span class = "s" > 'student_resid' < / span><span class = "p" >]< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >student< / span><span class = "p" >)< / span> |
In [41]:
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >student< / span><span class = "o" >.< / span><span class = "n" >ix< / span><span class = "p" >[< / span><span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" > abs < / span><span class = "p" >(< / span><span class = "n" >student< / span><span class = "p" >)< / span> <span class = "o" >>< / span> <span class = "mi" > 2 < / span><span class = "p" >])< / span> |
In [42]:
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >infl< / span><span class = "o" >.< / span><span class = "n" >summary_frame< / span><span class = "p" >()< / span><span class = "o" >.< / span><span class = "n" >ix< / span><span class = "p" >[< / span><span class = "s" > 'minister' < / span><span class = "p" >])< / span> |
In [43]:
1 2 3 | <span class = "n" >sidak< / span> <span class = "o" > = < / span> <span class = "n" >ols_model< / span><span class = "o" >.< / span><span class = "n" >outlier_test< / span><span class = "p" >(< / span><span class = "s" > 'sidak' < / span><span class = "p" >)< / span> <span class = "n" >sidak< / span><span class = "o" >.< / span><span class = "n" >sort< / span><span class = "p" >(< / span><span class = "s" > 'unadj_p' < / span><span class = "p" >,< / span> <span class = "n" >inplace< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >sidak< / span><span class = "p" >)< / span> |
In [44]:
1 2 3 | <span class = "n" >fdr< / span> <span class = "o" > = < / span> <span class = "n" >ols_model< / span><span class = "o" >.< / span><span class = "n" >outlier_test< / span><span class = "p" >(< / span><span class = "s" > 'fdr_bh' < / span><span class = "p" >)< / span> <span class = "n" >fdr< / span><span class = "o" >.< / span><span class = "n" >sort< / span><span class = "p" >(< / span><span class = "s" > 'unadj_p' < / span><span class = "p" >,< / span> <span class = "n" >inplace< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >fdr< / span><span class = "p" >)< / span> |
In [45]:
1 2 | <span class = "n" >rlm_model< / span> <span class = "o" > = < / span> <span class = "n" >rlm< / span><span class = "p" >(< / span><span class = "s" > 'prestige ~ income + education' < / span><span class = "p" >,< / span> <span class = "n" >prestige< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >rlm_model< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
In [46]:
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >rlm_model< / span><span class = "o" >.< / span><span class = "n" >weights< / span><span class = "p" >)< / span> |
Hertzprung Russell data for Star Cluster CYG 0B1 - Leverage Points
- Data is on the luminosity and temperature of 47 stars in the direction of Cygnus.
In [47]:
1 | <span class = "n" >dta< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >datasets< / span><span class = "o" >.< / span><span class = "n" >get_rdataset< / span><span class = "p" >(< / span><span class = "s" > "starsCYG" < / span><span class = "p" >,< / span> <span class = "s" > "robustbase" < / span><span class = "p" >,< / span> <span class = "n" >cache< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >data< / span> |
In [48]:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | <span class = "kn" > from < / span> <span class = "nn" >matplotlib.patches< / span> <span class = "kn" > import < / span> <span class = "n" >Ellipse< / span> <span class = "n" >fig< / span> <span class = "o" > = < / span> <span class = "n" >plt< / span><span class = "o" >.< / span><span class = "n" >figure< / span><span class = "p" >(< / span><span class = "n" >figsize< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mi" > 12 < / span><span class = "p" >,< / span><span class = "mi" > 8 < / span><span class = "p" >))< / span> <span class = "n" >ax< / span> <span class = "o" > = < / span> <span class = "n" >fig< / span><span class = "o" >.< / span><span class = "n" >add_subplot< / span><span class = "p" >(< / span><span class = "mi" > 111 < / span><span class = "p" >,< / span> <span class = "n" >xlabel< / span><span class = "o" > = < / span><span class = "s" > 'log(Temp)' < / span><span class = "p" >,< / span> <span class = "n" >ylabel< / span><span class = "o" > = < / span><span class = "s" > 'log(Light)' < / span><span class = "p" >,< / span> <span class = "n" >title< / span><span class = "o" > = < / span><span class = "s" > 'Hertzsprung-Russell Diagram of Star Cluster CYG OB1' < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >scatter< / span><span class = "p" >(< / span><span class = "o" > * < / span><span class = "n" >dta< / span><span class = "o" >.< / span><span class = "n" >values< / span><span class = "o" >.< / span><span class = "n" >T< / span><span class = "p" >)< / span> <span class = "c" > # highlight outliers</span> <span class = "n" >e< / span> <span class = "o" > = < / span> <span class = "n" >Ellipse< / span><span class = "p" >((< / span><span class = "mf" > 3.5 < / span><span class = "p" >,< / span> <span class = "mi" > 6 < / span><span class = "p" >),< / span> <span class = "o" >.< / span><span class = "mi" > 2 < / span><span class = "p" >,< / span> <span class = "mi" > 1 < / span><span class = "p" >,< / span> <span class = "n" >alpha< / span><span class = "o" > = .< / span><span class = "mi" > 25 < / span><span class = "p" >,< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "s" > 'r' < / span><span class = "p" >)< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >add_patch< / span><span class = "p" >(< / span><span class = "n" >e< / span><span class = "p" >);< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >annotate< / span><span class = "p" >(< / span><span class = "s" > 'Red giants' < / span><span class = "p" >,< / span> <span class = "n" >xy< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mf" > 3.6 < / span><span class = "p" >,< / span> <span class = "mi" > 6 < / span><span class = "p" >),< / span> <span class = "n" >xytext< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "mf" > 3.8 < / span><span class = "p" >,< / span> <span class = "mi" > 6 < / span><span class = "p" >),< / span> <span class = "n" >arrowprops< / span><span class = "o" > = < / span><span class = "nb" > dict < / span><span class = "p" >(< / span><span class = "n" >facecolor< / span><span class = "o" > = < / span><span class = "s" > 'black' < / span><span class = "p" >,< / span> <span class = "n" >shrink< / span><span class = "o" > = < / span><span class = "mf" > 0.05 < / span><span class = "p" >,< / span> <span class = "n" >width< / span><span class = "o" > = < / span><span class = "mi" > 2 < / span><span class = "p" >),< / span> <span class = "n" >horizontalalignment< / span><span class = "o" > = < / span><span class = "s" > 'left' < / span><span class = "p" >,< / span> <span class = "n" >verticalalignment< / span><span class = "o" > = < / span><span class = "s" > 'bottom' < / span><span class = "p" >,< / span> <span class = "n" >clip_on< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >,< / span> <span class = "c" > # clip to the axes bounding box</span> <span class = "n" >fontsize< / span><span class = "o" > = < / span><span class = "mi" > 16 < / span><span class = "p" >,< / span> <span class = "p" >)< / span> <span class = "c" > # annotate these with their index</span> <span class = "k" > for < / span> <span class = "n" >i< / span><span class = "p" >,< / span><span class = "n" >row< / span> <span class = "ow" > in < / span> <span class = "n" >dta< / span><span class = "o" >.< / span><span class = "n" >ix< / span><span class = "p" >[< / span><span class = "n" >dta< / span><span class = "p" >[< / span><span class = "s" > 'log.Te' < / span><span class = "p" >]< / span> <span class = "o" ><< / span> <span class = "mf" > 3.8 < / span><span class = "p" >]< / span><span class = "o" >.< / span><span class = "n" >iterrows< / span><span class = "p" >():< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >annotate< / span><span class = "p" >(< / span><span class = "n" >i< / span><span class = "p" >,< / span> <span class = "n" >row< / span><span class = "p" >,< / span> <span class = "n" >row< / span> <span class = "o" > + < / span> <span class = "o" >.< / span><span class = "mo" > 01 < / span><span class = "p" >,< / span> <span class = "n" >fontsize< / span><span class = "o" > = < / span><span class = "mi" > 14 < / span><span class = "p" >)< / span> <span class = "n" >xlim< / span><span class = "p" >,< / span> <span class = "n" >ylim< / span> <span class = "o" > = < / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >get_xlim< / span><span class = "p" >(),< / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >get_ylim< / span><span class = "p" >()< / span> |
In [49]:
1 2 | <span class = "kn" > from < / span> <span class = "nn" >IPython.display< / span> <span class = "kn" > import < / span> <span class = "n" >Image< / span> <span class = "n" >Image< / span><span class = "p" >(< / span><span class = "n" >filename< / span><span class = "o" > = < / span><span class = "s" > 'star_diagram.png' < / span><span class = "p" >)< / span> |
Out[49]:
In [50]:
1 2 3 4 | <span class = "n" >y< / span> <span class = "o" > = < / span> <span class = "n" >dta< / span><span class = "p" >[< / span><span class = "s" > 'log.light' < / span><span class = "p" >]< / span> <span class = "n" >X< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >add_constant< / span><span class = "p" >(< / span><span class = "n" >dta< / span><span class = "p" >[< / span><span class = "s" > 'log.Te' < / span><span class = "p" >],< / span> <span class = "n" >prepend< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >)< / span> <span class = "n" >ols_model< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >OLS< / span><span class = "p" >(< / span><span class = "n" >y< / span><span class = "p" >,< / span> <span class = "n" >X< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >model_results< / span><span class = "o" > = < / span><span class = "n" >ols_model< / span><span class = "p" >,< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >)< / span> |
Out[50]:
In [51]:
1 2 | <span class = "n" >rlm_mod< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >RLM< / span><span class = "p" >(< / span><span class = "n" >y< / span><span class = "p" >,< / span> <span class = "n" >X< / span><span class = "p" >,< / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >robust< / span><span class = "o" >.< / span><span class = "n" >norms< / span><span class = "o" >.< / span><span class = "n" >TrimmedMean< / span><span class = "p" >(< / span><span class = "o" >.< / span><span class = "mi" > 5 < / span><span class = "p" >))< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >model_results< / span><span class = "o" > = < / span><span class = "n" >rlm_mod< / span><span class = "p" >,< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >,< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "s" > 'red' < / span><span class = "p" >)< / span> |
Out[51]:
- Why? Because M-estimators are not robust to leverage points.
In [52]:
1 | <span class = "n" >infl< / span> <span class = "o" > = < / span> <span class = "n" >ols_model< / span><span class = "o" >.< / span><span class = "n" >get_influence< / span><span class = "p" >()< / span> |
In [53]:
1 2 3 | <span class = "n" >h_bar< / span> <span class = "o" > = < / span> <span class = "mi" > 2 < / span><span class = "o" > * < / span><span class = "p" >(< / span><span class = "n" >ols_model< / span><span class = "o" >.< / span><span class = "n" >df_model< / span> <span class = "o" > + < / span> <span class = "mi" > 1 < / span> <span class = "p" >)< / span><span class = "o" > / < / span><span class = "n" >ols_model< / span><span class = "o" >.< / span><span class = "n" >nobs< / span> <span class = "n" >hat_diag< / span> <span class = "o" > = < / span> <span class = "n" >infl< / span><span class = "o" >.< / span><span class = "n" >summary_frame< / span><span class = "p" >()[< / span><span class = "s" > 'hat_diag' < / span><span class = "p" >]< / span> <span class = "n" >hat_diag< / span><span class = "o" >.< / span><span class = "n" >ix< / span><span class = "p" >[< / span><span class = "n" >hat_diag< / span> <span class = "o" >>< / span> <span class = "n" >h_bar< / span><span class = "p" >]< / span> |
Out[53]:
In [54]:
1 2 3 | <span class = "n" >sidak2< / span> <span class = "o" > = < / span> <span class = "n" >ols_model< / span><span class = "o" >.< / span><span class = "n" >outlier_test< / span><span class = "p" >(< / span><span class = "s" > 'sidak' < / span><span class = "p" >)< / span> <span class = "n" >sidak2< / span><span class = "o" >.< / span><span class = "n" >sort< / span><span class = "p" >(< / span><span class = "s" > 'unadj_p' < / span><span class = "p" >,< / span> <span class = "n" >inplace< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >sidak2< / span><span class = "p" >)< / span> |
In [55]:
1 2 3 | <span class = "n" >fdr2< / span> <span class = "o" > = < / span> <span class = "n" >ols_model< / span><span class = "o" >.< / span><span class = "n" >outlier_test< / span><span class = "p" >(< / span><span class = "s" > 'fdr_bh' < / span><span class = "p" >)< / span> <span class = "n" >fdr2< / span><span class = "o" >.< / span><span class = "n" >sort< / span><span class = "p" >(< / span><span class = "s" > 'unadj_p' < / span><span class = "p" >,< / span> <span class = "n" >inplace< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >fdr2< / span><span class = "p" >)< / span> |
- Let's delete that line
In [56]:
1 | <span class = "k" > del < / span> <span class = "n" >ax< / span><span class = "o" >.< / span><span class = "n" >lines< / span><span class = "p" >[< / span><span class = "o" > - < / span><span class = "mi" > 1 < / span><span class = "p" >]< / span> |
In [57]:
1 2 3 4 | <span class = "n" >weights< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >ones< / span><span class = "p" >(< / span><span class = "nb" > len < / span><span class = "p" >(< / span><span class = "n" >X< / span><span class = "p" >))< / span> <span class = "n" >weights< / span><span class = "p" >[< / span><span class = "n" >X< / span><span class = "p" >[< / span><span class = "n" >X< / span><span class = "p" >[< / span><span class = "s" > 'log.Te' < / span><span class = "p" >]< / span> <span class = "o" ><< / span> <span class = "mf" > 3.8 < / span><span class = "p" >]< / span><span class = "o" >.< / span><span class = "n" >index< / span><span class = "o" >.< / span><span class = "n" >values< / span> <span class = "o" > - < / span> <span class = "mi" > 1 < / span><span class = "p" >]< / span> <span class = "o" > = < / span> <span class = "mi" > 0 < / span> <span class = "n" >wls_model< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >WLS< / span><span class = "p" >(< / span><span class = "n" >y< / span><span class = "p" >,< / span> <span class = "n" >X< / span><span class = "p" >,< / span> <span class = "n" >weights< / span><span class = "o" > = < / span><span class = "n" >weights< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >model_results< / span><span class = "o" > = < / span><span class = "n" >wls_model< / span><span class = "p" >,< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >,< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "s" > 'green' < / span><span class = "p" >)< / span> |
- MM estimators are good for this type of problem, unfortunately, we don't yet have these yet.
- It's being worked on, but it gives a good excuse to look at the R cell magics in the notebook.
In [58]:
1 2 | <span class = "n" >yy< / span> <span class = "o" > = < / span> <span class = "n" >y< / span><span class = "o" >.< / span><span class = "n" >values< / span><span class = "p" >[:,< / span><span class = "bp" > None < / span><span class = "p" >]< / span> <span class = "n" >xx< / span> <span class = "o" > = < / span> <span class = "n" >X< / span><span class = "p" >[< / span><span class = "s" > 'log.Te' < / span><span class = "p" >]< / span><span class = "o" >.< / span><span class = "n" >values< / span><span class = "p" >[:,< / span><span class = "bp" > None < / span><span class = "p" >]< / span> |
In [59]:
1 2 3 4 5 6 7 | <span class = "o" > % < / span><span class = "k" >load_ext< / span> <span class = "n" >rmagic< / span> <span class = "o" > % < / span><span class = "k" >R< / span> <span class = "n" >library< / span><span class = "p" >(< / span><span class = "n" >robustbase< / span><span class = "p" >)< / span> <span class = "o" > % < / span><span class = "k" >Rpush< / span> <span class = "n" >yy< / span> <span class = "n" >xx< / span> <span class = "o" > % < / span><span class = "k" >R< / span> <span class = "n" >mod< / span> <span class = "o" >< - < / span> <span class = "n" >lmrob< / span><span class = "p" >(< / span><span class = "n" >yy< / span> <span class = "o" >~< / span> <span class = "n" >xx< / span><span class = "p" >);< / span> <span class = "o" > % < / span><span class = "k" >R< / span> <span class = "n" >params< / span> <span class = "o" >< - < / span> <span class = "n" >mod< / span><span class = "err" >$< / span><span class = "n" >coefficients< / span><span class = "p" >;< / span> <span class = "o" > % < / span><span class = "k" >Rpull< / span> <span class = "n" >params< / span> |
In [60]:
1 | <span class = "o" > % < / span><span class = "k" >R< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >mod< / span><span class = "p" >)< / span> |
In [61]:
1 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >params< / span><span class = "p" >)< / span> |
In [62]:
1 | <span class = "n" >abline_plot< / span><span class = "p" >(< / span><span class = "n" >intercept< / span><span class = "o" > = < / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "mi" > 0 < / span><span class = "p" >],< / span> <span class = "n" >slope< / span><span class = "o" > = < / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "mi" > 1 < / span><span class = "p" >],< / span> <span class = "n" >ax< / span><span class = "o" > = < / span><span class = "n" >ax< / span><span class = "p" >,< / span> <span class = "n" >color< / span><span class = "o" > = < / span><span class = "s" > 'green' < / span><span class = "p" >)< / span> |
Exercise: Breakdown points of M-estimator
In [63]:
1 2 3 4 5 6 7 8 | <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >random< / span><span class = "o" >.< / span><span class = "n" >seed< / span><span class = "p" >(< / span><span class = "mi" > 12345 < / span><span class = "p" >)< / span> <span class = "n" >nobs< / span> <span class = "o" > = < / span> <span class = "mi" > 200 < / span> <span class = "n" >beta_true< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >array< / span><span class = "p" >([< / span><span class = "mi" > 3 < / span><span class = "p" >,< / span> <span class = "mi" > 1 < / span><span class = "p" >,< / span> <span class = "mf" > 2.5 < / span><span class = "p" >,< / span> <span class = "mi" > 3 < / span><span class = "p" >,< / span> <span class = "o" > - < / span><span class = "mi" > 4 < / span><span class = "p" >])< / span> <span class = "n" >X< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >random< / span><span class = "o" >.< / span><span class = "n" >uniform< / span><span class = "p" >(< / span><span class = "o" > - < / span><span class = "mi" > 20 < / span><span class = "p" >,< / span><span class = "mi" > 20 < / span><span class = "p" >,< / span> <span class = "n" >size< / span><span class = "o" > = < / span><span class = "p" >(< / span><span class = "n" >nobs< / span><span class = "p" >,< / span> <span class = "nb" > len < / span><span class = "p" >(< / span><span class = "n" >beta_true< / span><span class = "p" >)< / span><span class = "o" > - < / span><span class = "mi" > 1 < / span><span class = "p" >))< / span> <span class = "c" > # stack a constant in front</span> <span class = "n" >X< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >add_constant< / span><span class = "p" >(< / span><span class = "n" >X< / span><span class = "p" >,< / span> <span class = "n" >prepend< / span><span class = "o" > = < / span><span class = "bp" > True < / span><span class = "p" >)< / span> <span class = "c" > # np.c_[np.ones(nobs), X]</span> <span class = "n" >mc_iter< / span> <span class = "o" > = < / span> <span class = "mi" > 500 < / span> <span class = "n" >contaminate< / span> <span class = "o" > = < / span> <span class = "o" >.< / span><span class = "mi" > 25 < / span> <span class = "c" > # percentage of response variables to contaminate</span> |
In [64]:
1 2 3 4 5 6 7 | <span class = "n" >all_betas< / span> <span class = "o" > = < / span> <span class = "p" >[]< / span> <span class = "k" > for < / span> <span class = "n" >i< / span> <span class = "ow" > in < / span> <span class = "nb" > range < / span><span class = "p" >(< / span><span class = "n" >mc_iter< / span><span class = "p" >):< / span> <span class = "n" >y< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >dot< / span><span class = "p" >(< / span><span class = "n" >X< / span><span class = "p" >,< / span> <span class = "n" >beta_true< / span><span class = "p" >)< / span> <span class = "o" > + < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >random< / span><span class = "o" >.< / span><span class = "n" >normal< / span><span class = "p" >(< / span><span class = "n" >size< / span><span class = "o" > = < / span><span class = "mi" > 200 < / span><span class = "p" >)< / span> <span class = "n" >random_idx< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >random< / span><span class = "o" >.< / span><span class = "n" >randint< / span><span class = "p" >(< / span><span class = "mi" > 0 < / span><span class = "p" >,< / span> <span class = "n" >nobs< / span><span class = "p" >,< / span> <span class = "n" >size< / span><span class = "o" > = < / span><span class = "nb" > int < / span><span class = "p" >(< / span><span class = "n" >contaminate< / span> <span class = "o" > * < / span> <span class = "n" >nobs< / span><span class = "p" >))< / span> <span class = "n" >y< / span><span class = "p" >[< / span><span class = "n" >random_idx< / span><span class = "p" >]< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >random< / span><span class = "o" >.< / span><span class = "n" >uniform< / span><span class = "p" >(< / span><span class = "o" > - < / span><span class = "mi" > 750 < / span><span class = "p" >,< / span> <span class = "mi" > 750 < / span><span class = "p" >)< / span> <span class = "n" >beta_hat< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >RLM< / span><span class = "p" >(< / span><span class = "n" >y< / span><span class = "p" >,< / span> <span class = "n" >X< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span><span class = "o" >.< / span><span class = "n" >params< / span> <span class = "n" >all_betas< / span><span class = "o" >.< / span><span class = "n" >append< / span><span class = "p" >(< / span><span class = "n" >beta_hat< / span><span class = "p" >)< / span> |
In [65]:
1 2 3 | <span class = "n" >all_betas< / span> <span class = "o" > = < / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >asarray< / span><span class = "p" >(< / span><span class = "n" >all_betas< / span><span class = "p" >)< / span> <span class = "n" >se_loss< / span> <span class = "o" > = < / span> <span class = "k" > lambda < / span> <span class = "n" >x< / span> <span class = "p" >:< / span> <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >linalg< / span><span class = "o" >.< / span><span class = "n" >norm< / span><span class = "p" >(< / span><span class = "n" >x< / span><span class = "p" >,< / span> <span class = "nb" > ord < / span><span class = "o" > = < / span><span class = "mi" > 2 < / span><span class = "p" >)< / span><span class = "o" > * * < / span><span class = "mi" > 2 < / span> <span class = "n" >se_beta< / span> <span class = "o" > = < / span> <span class = "n" >lmap< / span><span class = "p" >(< / span><span class = "n" >se_loss< / span><span class = "p" >,< / span> <span class = "n" >all_betas< / span> <span class = "o" > - < / span> <span class = "n" >beta_true< / span><span class = "p" >)< / span> |
Squared error loss
In [66]:
1 | <span class = "n" >np< / span><span class = "o" >.< / span><span class = "n" >array< / span><span class = "p" >(< / span><span class = "n" >se_beta< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >mean< / span><span class = "p" >()< / span> |
Out[66]:
In [67]:
1 | <span class = "n" >all_betas< / span><span class = "o" >.< / span><span class = "n" >mean< / span><span class = "p" >(< / span><span class = "mi" > 0 < / span><span class = "p" >)< / span> |
Out[67]:
In [68]:
1 | <span class = "n" >beta_true< / span> |
Out[68]:
In [69]:
1 | <span class = "n" >se_loss< / span><span class = "p" >(< / span><span class = "n" >all_betas< / span><span class = "o" >.< / span><span class = "n" >mean< / span><span class = "p" >(< / span><span class = "mi" > 0 < / span><span class = "p" >)< / span> <span class = "o" > - < / span> <span class = "n" >beta_true< / span><span class = "p" >)< / span> |
Out[69]:
Please login to continue.