Prediction (out of sample)

Prediction (out of sample)

Link to Notebook GitHub

In [1]:
1
2
3
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="kn">as</span> <span class="nn">sm</span>

Artificial data

In [2]:
1
2
3
4
5
6
7
8
<span class="n">nsample</span> <span class="o">=</span> <span class="mi">50</span>
<span class="n">sig</span> <span class="o">=</span> <span class="mf">0.25</span>
<span class="n">x1</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="n">nsample</span><span class="p">)</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">column_stack</span><span class="p">((</span><span class="n">x1</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">x1</span><span class="p">),</span> <span class="p">(</span><span class="n">x1</span><span class="o">-</span><span class="mi">5</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">))</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="n">beta</span> <span class="o">=</span> <span class="p">[</span><span class="mf">5.</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.02</span><span class="p">]</span>
<span class="n">y_true</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">beta</span><span class="p">)</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">y_true</span> <span class="o">+</span> <span class="n">sig</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">nsample</span><span class="p">)</span>

Estimation

In [3]:
1
2
3
<span class="n">olsmod</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span>
<span class="n">olsres</span> <span class="o">=</span> <span class="n">olsmod</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="n">olsres</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span>
                            OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.981
Model:                            OLS   Adj. R-squared:                  0.980
Method:                 Least Squares   F-statistic:                     808.7
Date:                Tue, 02 Dec 2014   Prob (F-statistic):           8.70e-40
Time:                        12:53:22   Log-Likelihood:                -3.7832
No. Observations:                  50   AIC:                             15.57
Df Residuals:                      46   BIC:                             23.21
Df Model:                           3
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          4.8950      0.093     52.780      0.000         4.708     5.082
x1             0.5222      0.014     36.510      0.000         0.493     0.551
x2             0.3781      0.056      6.725      0.000         0.265     0.491
x3            -0.0216      0.001    -17.196      0.000        -0.024    -0.019
==============================================================================
Omnibus:                        8.517   Durbin-Watson:                   1.657
Prob(Omnibus):                  0.014   Jarque-Bera (JB):                7.697
Skew:                          -0.900   Prob(JB):                       0.0213
Kurtosis:                       3.677   Cond. No.                         221.
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

In-sample prediction

In [4]:
1
2
<span class="n">ypred</span> <span class="o">=</span> <span class="n">olsres</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">ypred</span><span class="p">)</span>
[  4.3551   4.8029   5.2188   5.5823   5.8801   6.1088   6.2748   6.394
   6.4886   6.5836   6.703    6.8653   7.081    7.3506   7.6644   8.0046
   8.3474   8.6679   8.9433   9.1572   9.3021   9.3806   9.4052   9.3957
   9.3768   9.3732   9.4061   9.4894   9.6276   9.8149  10.0365  10.2706
  10.492   10.6768  10.8056  10.8672  10.8601  10.7931  10.6835  10.5545
  10.4313  10.3369  10.2884  10.2944  10.3529  10.4526  10.5739  10.6925
  10.7836  10.8255]

Create a new sample of explanatory variables Xnew, predict and plot

In [5]:
1
2
3
4
5
<span class="n">x1n</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mf">20.5</span><span class="p">,</span><span class="mi">25</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
<span class="n">Xnew</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">column_stack</span><span class="p">((</span><span class="n">x1n</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">sin</span><span class="p">(</span><span class="n">x1n</span><span class="p">),</span> <span class="p">(</span><span class="n">x1n</span><span class="o">-</span><span class="mi">5</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">))</span>
<span class="n">Xnew</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">add_constant</span><span class="p">(</span><span class="n">Xnew</span><span class="p">)</span>
<span class="n">ynewpred</span> <span class="o">=</span>  <span class="n">olsres</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">Xnew</span><span class="p">)</span> <span class="c"># predict out of sample</span>
<span class="k">print</span><span class="p">(</span><span class="n">ynewpred</span><span class="p">)</span>
[ 10.789   10.6494  10.4216  10.1393   9.847    9.589    9.3986   9.2898
   9.254    9.2621]

Plot comparison

In [6]:
1
2
3
4
5
6
7
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="kn">as</span> <span class="nn">plt</span>
 
<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span>
<span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="s">'o'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Data"</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">y_true</span><span class="p">,</span> <span class="s">'b-'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"True"</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">hstack</span><span class="p">((</span><span class="n">x1</span><span class="p">,</span> <span class="n">x1n</span><span class="p">)),</span> <span class="n">np</span><span class="o">.</span><span class="n">hstack</span><span class="p">((</span><span class="n">ypred</span><span class="p">,</span> <span class="n">ynewpred</span><span class="p">)),</span> <span class="s">'r'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"OLS prediction"</span><span class="p">)</span>
<span class="n">ax</span><span class="o">.</span><span class="n">legend</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="s">"best"</span><span class="p">);</span>

Predicting with Formulas

Using formulas can make both estimation and prediction a lot easier

In [7]:
1
2
3
4
5
<span class="kn">from</span> <span class="nn">statsmodels.formula.api</span> <span class="kn">import</span> <span class="n">ols</span>
 
<span class="n">data</span> <span class="o">=</span> <span class="p">{</span><span class="s">"x1"</span> <span class="p">:</span> <span class="n">x1</span><span class="p">,</span> <span class="s">"y"</span> <span class="p">:</span> <span class="n">y</span><span class="p">}</span>
 
<span class="n">res</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="s">"y ~ x1 + np.sin(x1) + I((x1-5)**2)"</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">data</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>

We use the I to indicate use of the Identity transform. Ie., we don't want any expansion magic from using **2

In [8]:
1
<span class="n">res</span><span class="o">.</span><span class="n">params</span>
Out[8]:
Intercept           4.895017
x1                  0.522221
np.sin(x1)          0.378110
I((x1 - 5) ** 2)   -0.021596
dtype: float64

Now we only have to pass the single variable and we get the transformed right-hand side variables automatically

In [9]:
1
<span class="n">res</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">exog</span><span class="o">=</span><span class="nb">dict</span><span class="p">(</span><span class="n">x1</span><span class="o">=</span><span class="n">x1n</span><span class="p">))</span>
Out[9]:
array([ 10.789 ,  10.6494,  10.4216,  10.1393,   9.847 ,   9.589 ,
         9.3986,   9.2898,   9.254 ,   9.2621])
doc_statsmodels
2025-01-10 15:47:30
Comments
Leave a Comment

Please login to continue.