4 Linear models for non-stationary and seasonal time series.

5 Forecasting (prediction, extrapolation) using ARIMA models.

5.1 Introduction.

We have already met forecasting in one form or another. Firstly we extrapolated the deterministic components of regression models into the future, for trend, seasonal and cyclical models. Therefore, in the model $y_{t}=x_{t}^{{}^{\prime}}\beta+\epsilon_{t}$ where $\{\epsilon_{t}\}$ are uncorrelated with zero mean, the forecast for $y_{t+k}=x_{t+k}^{{}^{\prime}}\beta+\epsilon_{t}$ based on $y_{1},\ldots,y_{t}$ is $x_{t+k}^{{}^{\prime}}\hat{\beta}$ where $\hat{\beta}$ is the estimate of $\beta$ based on $y_{1},\ldots,y_{t}$ .

This is still appropriate if such components are used in a model for a time series which includes ARMA error structure. But now the ARMA part also needs to be extrapolated. For uncorrelated errors the extrapolations would be simply zero. But when the errors are correlated the expected future values are not all zero.

We also came across the concept of prediction when introducing autoregressive models. These again exploit regression - lagged regression on past values to predict the future.

However the approach in this chapter is model based. Forecasting is simply asking the question about where the process (that has been modelled) is going in the future, given that you have observed it up to the present. We investigate this now for models that can be expressed in the form

x_{t}=\sum_{j=0}^{\infty}\psi_{j}e_{t-j}.

5.2 The impact of innovations on the future.

To understand prediction of future values of the process we will now investigate how future values $x_{t+1},x_{t+2},\ldots$ depend in part upon the past (and present) quantities $x_{t},x_{t-1},\ldots$ and $e_{t},e_{t-1},\ldots$ that are known at time $t$ ; and in part upon the future innovations $e_{t+1},e_{t+2},\ldots$ . We first look at particular simple examples. We omit $\mu_{x}$ for simplicity. The method we use is to write the model down with $t$ replaced by $t+1$ then $t+2$ …. If the model has terms on its right hand side which include any values in $x_{t+1},x_{t+2},\ldots$ , we successively substitute for each of these so no such terms remain; only terms in $e_{t+1},e_{t+2},\ldots$ remain on the right, besides those in $x_{t},x_{t-1},\ldots$ and $e_{t},e_{t-1},\ldots$

5.2.1 Example: the MA(2) model:

\begin{array}[]{rcl}x_{t+1}=&e_{t+1}&-\theta_{1}e_{t}-\theta_{2}e_{t-1}\\ x_{t+2}=e_{t+2}&-\theta_{1}e_{t+1}&-\theta_{2}e_{t}\\ x_{t+3}=e_{t+3}-\theta_{1}e_{t+2}&-\theta_{2}e_{t+1}&\\ x_{t+4}=e_{t+4}-\theta_{1}e_{t+3}-\theta_{2}e_{t+2}&&\end{array}

So for $k>2$ , $x_{t+k}$ depends only on future innovations; not at all on the past.

5.2.2 Example: the AR(1) model:

\begin{array}[]{rcl}x_{t+1}=\phi x_{t}+&e_{t+1}&\\ x_{t+2}=\phi x_{t+1}+e_{t+2}=\phi^{2}x_{t}+&\phi e_{t+1}&+e_{t+2}\\ x_{t+3}=\phi x_{t+2}+e_{t+3}=\phi^{3}x_{t}+&\phi^{2}e_{t+1}&+\phi e_{t+2}+e_{t% +3}\\ x_{t+4}=\phi x_{t+3}+e_{t+4}=\phi^{4}x_{t}+&\phi^{3}e_{t+1}&+\phi^{2}e_{t+2}+% \phi e_{t+3}+e_{t+4}\end{array}

Thus the dependence of $x_{t+k}$ on the past is given by the coefficient $\phi^{k}$ of $x_{t}$ , decaying geometrically.

Now recall the infinite moving average representation of the AR(1) model, which after replacing $t$ by $t+k$ becomes

	$\displaystyle x_{t+k}$	$\displaystyle=$	$\displaystyle(1-\phi B)^{-1}e_{t+k}=(1+\phi B+\phi^{2}B^{2}+\phi^{3}B^{3}+% \cdots)e_{t+k}$
		$\displaystyle=$	$\displaystyle e_{t+k}+\phi e_{t+k-1}+\phi^{2}e_{t+k-2}+\phi^{3}e_{t+k-3}\ldots,$

and note that this gives the dependence of $x_{t+k}$ upon $e_{t+1},e_{t+2},\ldots$ that we observed in the previous equation.

5.2.3 Example: the IMA( $1,1$ ) model:

$\displaystyle x_{t+1}$	$\displaystyle=$	$\displaystyle x_{t}+e_{t+1}-\theta e_{t}$
$\displaystyle x_{t+2}=x_{t+1}+e_{t+2}-\theta e_{t+1}$	$\displaystyle=$	$\displaystyle x_{t}+e_{t+2}+(1-\theta)e_{t+1}-\theta e_{t}$
$\displaystyle x_{t+3}=x_{t+2}+e_{t+3}-\theta e_{t+2}$	$\displaystyle=$	$\displaystyle x_{t}+e_{t+3}+(1-\theta)e_{t+2}+(1-\theta)e_{t+1}-\theta e_{t}$
$\displaystyle x_{t+4}=x_{t+3}+e_{t+4}-\theta e_{t+3}$	$\displaystyle=$	$\displaystyle x_{t}+e_{t+4}+(1-\theta)e_{t+3}+(1-\theta)e_{t+2}+(1-\theta)e_{t% +1}-\theta e_{t}$

Note now that the infinite moving average representation of $x_{t+k}$ for the IMA(1,1) model is formally

5.2.4

\begin{array}[]{rl}x_{t+k}=\left(\frac{1-\theta B}{1-B}\right)e_{t+k}=&(1+(1-% \theta)B+(1-\theta)B^{2}+(1-\theta)B^{3}+\cdots)e_{t+k}\\ =&e_{t+k}+(1-\theta)e_{t+k-1}+(1-\theta)e_{t+k-2}+(1-\theta)e_{t+k-3}+\cdots.% \end{array}

We say ‘formally’ because this sum does not converge, but it does supply the correct coefficients by which the previous equation shows how the innovations $e_{t+1},e_{t+2},\ldots$ affect the future value $x_{t+k}$ . In this sense we may still write

x_{t}=\psi(B)e_{t}=e_{t}+\psi_{1}e_{t-1}+\psi_{2}e_{t-2}+\cdots.

where for an ARIMA model

x_{t}=\frac{\theta(B)}{\nabla^{d}\phi(B)}e_{t}\hskip 21.681pt\Rightarrow\hskip 2% 1.681pt\psi(B)=\frac{\theta(B)}{\nabla^{d}\phi(B)}.

5.3 Past and future components.

Suppose we have observations of $x_{t}$ up to and including time $t=n$ and, through the process of estimating the model, values of $e_{t}$ also up to and including time $t=n$ . We want to forecast future values $x_{n+k}$ for $k=1,2,3,\ldots$ , which we call $\hat{x}_{n,k}$ .

The general form of the infinite MA for $x_{n+k}$ may be written

\begin{array}[]{rcl}x_{n+k}=e_{n+k}+\psi_{1}e_{n+k-1}+\cdots\psi_{k-1}e_{n+1}+% &\psi_{k}e_{n}&+\psi_{k+1}e_{n-1}+\cdots\end{array}.

This shows explicitly that the part of $x_{n+k}$ which depends on future unknown innovations, is

e_{n,k}=e_{n+k}+\psi_{1}e_{n+k-1}+\cdots+\psi_{k-1}e_{n+1}

and the remaining part of $x_{n+k}$ is known. In other words

x_{n+k}=\sum_{j=0}^{\infty}\psi_{j}e_{n+k-j}=\sum_{j=k}^{\infty}\psi_{j}e_{n+k% -j}+\sum_{j=0}^{k-1}\psi_{j}e_{n+k-j}=\hat{x}_{n,k}+e_{n,k}.

The forecast error variance is therefore:

	$\displaystyle V_{k}=\mbox{Var}(e_{n,k})$	$\displaystyle=$	$\displaystyle\mbox{Var}(e_{n+k}+\psi_{1}e_{n+k-1}+\cdots+\psi_{k-1}e_{n+1})$
		$\displaystyle=$	$\displaystyle\sigma_{e}^{2}(1+\psi_{1}^{2}+\psi_{2}^{2}+\cdots+\psi_{k-1}^{2}).$

5.4 Forecast functions.

When forecasts made using ARIMA models are graphed for increasing lead time $k$ from a fixed forecast origin $n$ , i.e.

\hat{x}_{n,k},\ \ k=1,2,3,\ldots

the resulting graph is called a forecast function. It is usual to graph the forecast function with limits of $\pm 2\sqrt{V_{k}}$ to indicate the range of possible future paths of the series.

Any two such functions are usually different because they depend on the latest values of the series. However for a given model a similar pattern is usually evident in all forecast functions. To see why, consider the ARMA( $p, q$ ) model with $t=n+k$ where $k\,>\,q$ , and setting $e_{n+1}=e_{n+2}=\cdots=0$ to obtain the forecasts:

x_{n+k}=\mu_{x}+\phi_{1}(x_{n+k-1}-\mu_{x})+\cdots+\phi_{p}(x_{n+k-p}-\mu_{x})% +e_{n+k}-\theta_{1}e_{n+k-1}-\cdots-\theta_{q}e_{n+k-q}

\Rightarrow\hskip 72.27pt\hat{x}_{n,k}=\mu_{x}+\phi_{1}(\hat{x}_{n,k-1}-\mu_{x% })+\cdots+\phi_{p}(\hat{x}_{n,k-p}-\mu_{x})

because all terms $e_{n+k},e_{n+k-1},\ldots e_{n+k-q}$ have been set to zero. This may also be expressed:

\phi(B)(\hat{x}_{n,k}-\mu_{x})=0\hskip 28.908pt\mbox{for}\hskip 14.454ptk\,>\,q.

This is the same recurrence relationship which generated successive autocorrelations $\rho_{x,k}$ for the ARMA model. If $p=1$ then a geometric decay to the mean of the series results. If $p=2$ and $\phi(B)$ has complex factors the values will follow a damped cycle as they decay to $\mu_{x}$ .

Now consider ARIMA models. For example the IMA( $1,1$ ) model with drift:

x_{n+k}=x_{n+k-1}+\mu+e_{n+k}-\theta e_{n+k-1}

\Rightarrow\hskip 72.27pt\hat{x}_{n,k}=\hat{x}_{n,k-1}+\mu\hskip 28.908pt\mbox% {for}\hskip 14.454ptk\,>\,1.

so that the forecast function follows a trend line with slope $\mu$ . For this model we can show that $V_{k}=\sigma_{e}^{2}\{1+(k-1)(1-\theta^{2})\}$ so that the limits gradually widen out around the forecast function.

5.5 Updating equations.

When new observations are made, for example of $x_{n+1},x_{n+2},\ldots$ the new innovations $e_{n+1},e_{n+2},\ldots$ can be regenerated and at each time point $t=n+1,\ldots$ in succession forecasts can be generated by the means described. It is however possible to develop equations to update the forecast function in a convenient way. The EWMA updating equation is a good example. The idea is to update from $\hat{x}_{n,k}$ to $\hat{x}_{n+1,k}$ . From the expression given earlier for forecast errors we can get

\hat{x}_{n+1,k}=\hat{x}_{n,k+1}+\psi_{k}e_{n+1}.

By expressing $e_{n+1}=x_{n+1}-\hat{x}_{n,1}$ and $\hat{x}_{n,k+1}$ in terms of $\hat{x}_{n,k},\ldots$ this can be re-arranged to give the updating formula. For the IMA( $1,1$ ) model we start with $\psi_{1}=(1-\theta)$ so that

\hat{x}_{n+1,1}=\hat{x}_{n,2}+(1-\theta)e_{n+1}

and using $\hat{x}_{n,2}=\hat{x}_{n,1}$ and $e_{n+1}=x_{n+1}-\hat{x}_{n,1}$ gives

\hat{x}_{n+1,1}=\hat{x}_{n,1}+(1-\theta)(x_{n+1}-\hat{x}_{n,1})=(1-\theta)x_{n% +1}+\theta\hat{x}_{n,1}.

We conclude with some illustrations of forecast functions in Figure 26. In all of these the model is fitted to the data points before the forecasts start. The first of the four in Figure 26, shows forecasts of the weekly Financial Transaction series. The logarithms of the series are modelled by an IMA(1,1) model, but the forecasts and limits are transformed back to the original scale (and therefore you would not see constant forecast). The second shows forecasts of the daily temperatures, using an AR(1) model. The reversion of the forecast function to the mean level can be seen, as the error limits widen.

The third and fourth are forecasts of the annual sunspot series. An AR(2) model is used in the third, and an AR(8) in the fourth. The forecasts from the AR(2) model are seen to dampen down more quickly than those from the AR(8) model, which better captures the dynamics of the series. The error limits are actually only 90%, i.e. plus or minus 1.65 standard deviations. The errors for increasing lead-time are generally strongly correlated, so if the actual series values lie outside the limit at some point, it is quite likely that some of those close in time, will also do so.

The following four figures show forecasts of selected series using respectively an IMA(1,1), AR(1), AR(2) and AR(8) models.

Unnumbered Figure: Link

Forecasting is one of the most challenging applications of univariate time series analysis. There are many other applications, most of them being smoothing or extracting components of a series, for which model based approaches have many advantages. Classification of series is also important, with applications in medical diagnosis and seismic analysis. Multivariate time series opens up a new field of modelling the dependence between time series, which also has widespread applications.