Generalized Least Squares
1 2 3 4 | <span class = "kn" > from < / span> <span class = "nn" >__future__< / span> <span class = "kn" > import < / span> <span class = "n" >print_function< / span> <span class = "kn" > import < / span> <span class = "nn" >statsmodels.api< / span> <span class = "kn" >as< / span> <span class = "nn" >sm< / span> <span class = "kn" > import < / span> <span class = "nn" >numpy< / span> <span class = "kn" >as< / span> <span class = "nn" >np< / span> <span class = "kn" > from < / span> <span class = "nn" >statsmodels.iolib.table< / span> <span class = "kn" > import < / span> <span class = "p" >(< / span><span class = "n" >SimpleTable< / span><span class = "p" >,< / span> <span class = "n" >default_txt_fmt< / span><span class = "p" >)< / span> |
The Longley dataset is a time series dataset:
1 2 3 | <span class = "n" >data< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >datasets< / span><span class = "o" >.< / span><span class = "n" >longley< / span><span class = "o" >.< / span><span class = "n" >load< / span><span class = "p" >()< / span> <span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >exog< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >add_constant< / span><span class = "p" >(< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >exog< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >exog< / span><span class = "p" >[:< / span><span class = "mi" > 5 < / span><span class = "p" >])< / span> |
Let's assume that the data is heteroskedastic and that we know the nature of the heteroskedasticity. We can then define sigma
and use it to give us a GLS model
First we will obtain the residuals from an OLS fit
1 | <span class = "n" >ols_resid< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >OLS< / span><span class = "p" >(< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >endog< / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >exog< / span><span class = "p" >)< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span><span class = "o" >.< / span><span class = "n" >resid< / span> |
Assume that the error terms follow an AR(1) process with a trend:
$\epsilon_i = \beta_0 + \rho\epsilon_{i-1} + \eta_i$
where $\eta \sim N(0,\Sigma^2)$
and that $\rho$ is simply the correlation of the residual a consistent estimator for rho is to regress the residuals on the lagged residuals
1 2 3 | <span class = "n" >resid_fit< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >OLS< / span><span class = "p" >(< / span><span class = "n" >ols_resid< / span><span class = "p" >[< / span><span class = "mi" > 1 < / span><span class = "p" >:],< / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >add_constant< / span><span class = "p" >(< / span><span class = "n" >ols_resid< / span><span class = "p" >[:< / span><span class = "o" > - < / span><span class = "mi" > 1 < / span><span class = "p" >]))< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >resid_fit< / span><span class = "o" >.< / span><span class = "n" >tvalues< / span><span class = "p" >[< / span><span class = "mi" > 1 < / span><span class = "p" >])< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >resid_fit< / span><span class = "o" >.< / span><span class = "n" >pvalues< / span><span class = "p" >[< / span><span class = "mi" > 1 < / span><span class = "p" >])< / span> |
While we don't have strong evidence that the errors follow an AR(1) process we continue
1 | <span class = "n" >rho< / span> <span class = "o" > = < / span> <span class = "n" >resid_fit< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >[< / span><span class = "mi" > 1 < / span><span class = "p" >]< / span> |
As we know, an AR(1) process means that near-neighbors have a stronger relation so we can give this structure by using a toeplitz matrix
1 2 3 | <span class = "kn" > from < / span> <span class = "nn" >scipy.linalg< / span> <span class = "kn" > import < / span> <span class = "n" >toeplitz< / span> <span class = "n" >toeplitz< / span><span class = "p" >(< / span><span class = "nb" > range < / span><span class = "p" >(< / span><span class = "mi" > 5 < / span><span class = "p" >))< / span> |
1 | <span class = "n" >order< / span> <span class = "o" > = < / span> <span class = "n" >toeplitz< / span><span class = "p" >(< / span><span class = "nb" > range < / span><span class = "p" >(< / span><span class = "nb" > len < / span><span class = "p" >(< / span><span class = "n" >ols_resid< / span><span class = "p" >)))< / span> |
so that our error covariance structure is actually rho**order which defines an autocorrelation structure
1 2 3 | <span class = "n" >sigma< / span> <span class = "o" > = < / span> <span class = "n" >rho< / span><span class = "o" > * * < / span><span class = "n" >order< / span> <span class = "n" >gls_model< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >GLS< / span><span class = "p" >(< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >endog< / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >exog< / span><span class = "p" >,< / span> <span class = "n" >sigma< / span><span class = "o" > = < / span><span class = "n" >sigma< / span><span class = "p" >)< / span> <span class = "n" >gls_results< / span> <span class = "o" > = < / span> <span class = "n" >gls_model< / span><span class = "o" >.< / span><span class = "n" >fit< / span><span class = "p" >()< / span> |
Of course, the exact rho in this instance is not known so it it might make more sense to use feasible gls, which currently only has experimental support.
We can use the GLSAR model with one lag, to get to a similar result:
1 2 3 | <span class = "n" >glsar_model< / span> <span class = "o" > = < / span> <span class = "n" >sm< / span><span class = "o" >.< / span><span class = "n" >GLSAR< / span><span class = "p" >(< / span><span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >endog< / span><span class = "p" >,< / span> <span class = "n" >data< / span><span class = "o" >.< / span><span class = "n" >exog< / span><span class = "p" >,< / span> <span class = "mi" > 1 < / span><span class = "p" >)< / span> <span class = "n" >glsar_results< / span> <span class = "o" > = < / span> <span class = "n" >glsar_model< / span><span class = "o" >.< / span><span class = "n" >iterative_fit< / span><span class = "p" >(< / span><span class = "mi" > 1 < / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >glsar_results< / span><span class = "o" >.< / span><span class = "n" >summary< / span><span class = "p" >())< / span> |
Comparing gls and glsar results, we see that there are some small differences in the parameter estimates and the resulting standard errors of the parameter estimate. This might be do to the numerical differences in the algorithm, e.g. the treatment of initial conditions, because of the small number of observations in the longley dataset.
1 2 3 4 | <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >gls_results< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >glsar_results< / span><span class = "o" >.< / span><span class = "n" >params< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >gls_results< / span><span class = "o" >.< / span><span class = "n" >bse< / span><span class = "p" >)< / span> <span class = "k" > print < / span><span class = "p" >(< / span><span class = "n" >glsar_results< / span><span class = "o" >.< / span><span class = "n" >bse< / span><span class = "p" >)< / span> |
Please login to continue.