Dates in timeseries models

Dates in timeseries models

Link to Notebook GitHub

In [1]:
1
2
3
4
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="kn">as</span> <span class="nn">sm</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="kn">as</span> <span class="nn">pd</span>

Getting started

In [2]:
1
<span class="n">data</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">sunspots</span><span class="o">.</span><span class="n">load</span><span class="p">()</span>

Right now an annual date series must be datetimes at the end of the year.

In [3]:
1
2
<span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">datetime</span>
<span class="n">dates</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">tsa</span><span class="o">.</span><span class="n">datetools</span><span class="o">.</span><span class="n">dates_from_range</span><span class="p">(</span><span class="s">'1700'</span><span class="p">,</span> <span class="n">length</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">endog</span><span class="p">))</span>

Using Pandas

Make a pandas TimeSeries or DataFrame

In [4]:
1
<span class="n">endog</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">TimeSeries</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">endog</span><span class="p">,</span> <span class="n">index</span><span class="o">=</span><span class="n">dates</span><span class="p">)</span>

Instantiate the model

In [5]:
1
2
<span class="n">ar_model</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">tsa</span><span class="o">.</span><span class="n">AR</span><span class="p">(</span><span class="n">endog</span><span class="p">,</span> <span class="n">freq</span><span class="o">=</span><span class="s">'A'</span><span class="p">)</span>
<span class="n">pandas_ar_res</span> <span class="o">=</span> <span class="n">ar_model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">maxlag</span><span class="o">=</span><span class="mi">9</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s">'mle'</span><span class="p">,</span> <span class="n">disp</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>

Out-of-sample prediction

In [6]:
1
2
<span class="n">pred</span> <span class="o">=</span> <span class="n">pandas_ar_res</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">start</span><span class="o">=</span><span class="s">'2005'</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s">'2015'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">pred</span><span class="p">)</span>
2005-12-31    20.003289
2006-12-31    24.703987
2007-12-31    20.026131
2008-12-31    23.473638
2009-12-31    30.858577
2010-12-31    61.335460
2011-12-31    87.024702
2012-12-31    91.321267
2013-12-31    79.921646
2014-12-31    60.799552
2015-12-31    40.374916
Freq: A-DEC, dtype: float64

Using explicit dates

In [7]:
1
2
3
4
<span class="n">ar_model</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">tsa</span><span class="o">.</span><span class="n">AR</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">endog</span><span class="p">,</span> <span class="n">dates</span><span class="o">=</span><span class="n">dates</span><span class="p">,</span> <span class="n">freq</span><span class="o">=</span><span class="s">'A'</span><span class="p">)</span>
<span class="n">ar_res</span> <span class="o">=</span> <span class="n">ar_model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">maxlag</span><span class="o">=</span><span class="mi">9</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s">'mle'</span><span class="p">,</span> <span class="n">disp</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
<span class="n">pred</span> <span class="o">=</span> <span class="n">ar_res</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">start</span><span class="o">=</span><span class="s">'2005'</span><span class="p">,</span> <span class="n">end</span><span class="o">=</span><span class="s">'2015'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">pred</span><span class="p">)</span>
[ 20.0033  24.704   20.0261  23.4736  30.8586  61.3355  87.0247  91.3213
  79.9216  60.7996  40.3749]

This just returns a regular array, but since the model has date information attached, you can get the prediction dates in a roundabout way.

In [8]:
1
<span class="k">print</span><span class="p">(</span><span class="n">ar_res</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">predict_dates</span><span class="p">)</span>
<class 'pandas.tseries.index.DatetimeIndex'>
[2005-12-31, ..., 2015-12-31]
Length: 11, Freq: A-DEC, Timezone: None

Note: This attribute only exists if predict has been called. It holds the dates associated with the last call to predict.

doc_statsmodels
2025-01-10 15:47:30
Comments
Leave a Comment

Please login to continue.