5 Forecasting (prediction, extrapolation) using ARIMA models.

5.1 Introduction.

We have already met forecasting in one form or another. Firstly we extrapolated the deterministic components of regression models into the future, for trend, seasonal and cyclical models. Therefore, in the model yt=xtβ+ϵt where {ϵt} are uncorrelated with zero mean, the forecast for yt+k=xt+kβ+ϵt based on y1,,yt is xt+kβ^ where β^ is the estimate of β based on y1,,yt.

This is still appropriate if such components are used in a model for a time series which includes ARMA error structure. But now the ARMA part also needs to be extrapolated. For uncorrelated errors the extrapolations would be simply zero. But when the errors are correlated the expected future values are not all zero.

We also came across the concept of prediction when introducing autoregressive models. These again exploit regression - lagged regression on past values to predict the future.

However the approach in this chapter is model based. Forecasting is simply asking the question about where the process (that has been modelled) is going in the future, given that you have observed it up to the present. We investigate this now for models that can be expressed in the form

xt=j=0ψjet-j.

5.2 The impact of innovations on the future.

To understand prediction of future values of the process we will now investigate how future values xt+1,xt+2, depend in part upon the past (and present) quantities xt,xt-1, and et,et-1, that are known at time t; and in part upon the future innovations et+1,et+2,. We first look at particular simple examples. We omit μx for simplicity. The method we use is to write the model down with t replaced by t+1 then t+2 …. If the model has terms on its right hand side which include any values in xt+1,xt+2,, we successively substitute for each of these so no such terms remain; only terms in et+1,et+2, remain on the right, besides those in xt,xt-1, and et,et-1,

5.2.1 Example: the MA(2) model:

xt+1=et+1-θ1et-θ2et-1xt+2=et+2-θ1et+1-θ2etxt+3=et+3-θ1et+2-θ2et+1xt+4=et+4-θ1et+3-θ2et+2

So for k>2, xt+k depends only on future innovations; not at all on the past.

5.2.2 Example: the AR(1) model:

xt+1=ϕxt+et+1xt+2=ϕxt+1+et+2=ϕ2xt+ϕet+1+et+2xt+3=ϕxt+2+et+3=ϕ3xt+ϕ2et+1+ϕet+2+et+3xt+4=ϕxt+3+et+4=ϕ4xt+ϕ3et+1+ϕ2et+2+ϕet+3+et+4

Thus the dependence of xt+k on the past is given by the coefficient ϕk of xt, decaying geometrically.

Now recall the infinite moving average representation of the AR(1) model, which after replacing t by t+k becomes

xt+k = (1-ϕB)-1et+k=(1+ϕB+ϕ2B2+ϕ3B3+)et+k
= et+k+ϕet+k-1+ϕ2et+k-2+ϕ3et+k-3,

and note that this gives the dependence of xt+k upon et+1,et+2, that we observed in the previous equation.

5.2.3 Example: the IMA(1,1) model:

xt+1 = xt+et+1-θet
xt+2=xt+1+et+2-θet+1 = xt+et+2+(1-θ)et+1-θet
xt+3=xt+2+et+3-θet+2 = xt+et+3+(1-θ)et+2+(1-θ)et+1-θet
xt+4=xt+3+et+4-θet+3 = xt+et+4+(1-θ)et+3+(1-θ)et+2+(1-θ)et+1-θet

Note now that the infinite moving average representation of xt+k for the IMA(1,1) model is formally

5.2.4

xt+k=(1-θB1-B)et+k=(1+(1-θ)B+(1-θ)B2+(1-θ)B3+)et+k=et+k+(1-θ)et+k-1+(1-θ)et+k-2+(1-θ)et+k-3+.

We say ‘formally’ because this sum does not converge, but it does supply the correct coefficients by which the previous equation shows how the innovations et+1,et+2, affect the future value xt+k. In this sense we may still write

xt=ψ(B)et=et+ψ1et-1+ψ2et-2+.

where for an ARIMA model

xt=θ(B)dϕ(B)et      ψ(B)=θ(B)dϕ(B).

5.3 Past and future components.

Suppose we have observations of xt up to and including time t=n and, through the process of estimating the model, values of et also up to and including time t=n. We want to forecast future values xn+k for k=1,2,3,, which we call x^n,k.

The general form of the infinite MA for xn+k may be written

xn+k=en+k+ψ1en+k-1+ψk-1en+1+ψken+ψk+1en-1+.

This shows explicitly that the part of xn+k which depends on future unknown innovations, is

en,k=en+k+ψ1en+k-1++ψk-1en+1

and the remaining part of xn+k is known. In other words

xn+k=j=0ψjen+k-j=j=kψjen+k-j+j=0k-1ψjen+k-j=x^n,k+en,k.

The forecast error variance is therefore:

Vk=Var(en,k) = Var(en+k+ψ1en+k-1++ψk-1en+1)
= σe2(1+ψ12+ψ22++ψk-12).

5.4 Forecast functions.

When forecasts made using ARIMA models are graphed for increasing lead time k from a fixed forecast origin n, i.e.

x^n,k,k=1,2,3,

the resulting graph is called a forecast function. It is usual to graph the forecast function with limits of ±2Vk to indicate the range of possible future paths of the series.

Any two such functions are usually different because they depend on the latest values of the series. However for a given model a similar pattern is usually evident in all forecast functions. To see why, consider the ARMA(p,q) model with t=n+k where k>q, and setting en+1=en+2==0 to obtain the forecasts:

xn+k=μx+ϕ1(xn+k-1-μx)++ϕp(xn+k-p-μx)+en+k-θ1en+k-1--θqen+k-q
        x^n,k=μx+ϕ1(x^n,k-1-μx)++ϕp(x^n,k-p-μx)

because all terms en+k,en+k-1,en+k-q have been set to zero. This may also be expressed:

ϕ(B)(x^n,k-μx)=0    for  k>q.

This is the same recurrence relationship which generated successive autocorrelations ρx,k for the ARMA model. If p=1 then a geometric decay to the mean of the series results. If p=2 and ϕ(B) has complex factors the values will follow a damped cycle as they decay to μx.

Now consider ARIMA models. For example the IMA(1,1) model with drift:

xn+k=xn+k-1+μ+en+k-θen+k-1
        x^n,k=x^n,k-1+μ    for  k> 1.

so that the forecast function follows a trend line with slope μ. For this model we can show that Vk=σe2{1+(k-1)(1-θ2)} so that the limits gradually widen out around the forecast function.

5.5 Updating equations.

When new observations are made, for example of xn+1,xn+2, the new innovations en+1,en+2, can be regenerated and at each time point t=n+1, in succession forecasts can be generated by the means described. It is however possible to develop equations to update the forecast function in a convenient way. The EWMA updating equation is a good example. The idea is to update from x^n,k to x^n+1,k. From the expression given earlier for forecast errors we can get

x^n+1,k=x^n,k+1+ψken+1.

By expressing en+1=xn+1-x^n,1 and x^n,k+1 in terms of x^n,k, this can be re-arranged to give the updating formula. For the IMA(1,1) model we start with ψ1=(1-θ) so that

x^n+1,1=x^n,2+(1-θ)en+1

and using x^n,2=x^n,1 and en+1=xn+1-x^n,1 gives

x^n+1,1=x^n,1+(1-θ)(xn+1-x^n,1)=(1-θ)xn+1+θx^n,1.

We conclude with some illustrations of forecast functions in Figure 26. In all of these the model is fitted to the data points before the forecasts start. The first of the four in Figure 26, shows forecasts of the weekly Financial Transaction series. The logarithms of the series are modelled by an IMA(1,1) model, but the forecasts and limits are transformed back to the original scale (and therefore you would not see constant forecast). The second shows forecasts of the daily temperatures, using an AR(1) model. The reversion of the forecast function to the mean level can be seen, as the error limits widen.

The third and fourth are forecasts of the annual sunspot series. An AR(2) model is used in the third, and an AR(8) in the fourth. The forecasts from the AR(2) model are seen to dampen down more quickly than those from the AR(8) model, which better captures the dynamics of the series. The error limits are actually only 90%, i.e. plus or minus 1.65 standard deviations. The errors for increasing lead-time are generally strongly correlated, so if the actual series values lie outside the limit at some point, it is quite likely that some of those close in time, will also do so.

The following four figures show forecasts of selected series using respectively an IMA(1,1), AR(1), AR(2) and AR(8) models.

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Forecasting is one of the most challenging applications of univariate time series analysis. There are many other applications, most of them being smoothing or extracting components of a series, for which model based approaches have many advantages. Classification of series is also important, with applications in medical diagnosis and seismic analysis. Multivariate time series opens up a new field of modelling the dependence between time series, which also has widespread applications.