3 Linear models for stationary time series.

In this chapter we show how autocorrelation patterns can arise from models based on white noise. These models can readily be fitted to times series data, and used for a variety of applications, particularly forecasting.

3.1 Moving average models.

3.1.1 Example: The MA(1) model. Let

xt=μ+et-θet-1

where μ=μx and et is white noise. The reason for the - sign will beome evident.

The properties of this model are

E(xt)=μ and Var(xt)=(1+θ2)σe2
ρx,1=-θ1+θ2 and ρx,k=0 for k>1.

Proof as an exercise.

Remarks. |ρx,1|12, follows from (1±θ)20(1+θ2)2θ. Also

θ1+θ2=θ-11+(θ-1)2

so that replacing θ by θ-1 would result in the same acf. Provided |θ|1 we therefore have a choice of values for θ when modelling an acf of this form. The convention is to choose the value that satisfies |θ|<1.

3.1.2 Definition. A time series xt follows a moving average model of order q - MA(q) if it may be represented as

xt=μ+et-θ1et-1--θqet-q

where et is a white noise series.

Remarks. The model has q moving average parameters θj and one variance parameter σe2. For some purposes it is useful to have a slightly different notation and to write
3.1.3

xt=μ+ψ0et+ψ1et-1++ψqet-q

where ψ0=1 and ψj=-θj for j=1q. The name moving average is more usually applied when one observed time series takes the place of et in this equation, and a new series, taking the place of xt, is constructed according to this equation. Our model contrasts with this in that xt is observed, and we are proposing a model which gives rise to its autocorrelation pattern.

This is a convenient point to introduce an operator notation for discrete time series.

3.1.4 Definition. The backward shift operator B applied to a series et replaces it by the shifted series with values ft=et-1. Powers of B are similarly defined, i.e.

Bet=et-1 and Bmet=et-m.

Using this notation the MA(q) model may be expressed from Definition 3.1.2 as
3.1.5

xt = μ+et-θ1Bet-θ2B2et--θqBqet
= μ+(1-θ1B-θ2B--θqBq)et
= μ+θ(B)et

For example the MA(1) model is

xt=μ+(1-θB)et.

Now we can formally re-arrange this, and provided |θ|<1 expand:

et=(1-θB)-1(xt-μ)=(1+θB+θ2B2+θ3B3+)(xt-μ)=
(xt-μ)+θ(xt-1-μ)+θ2(xt-2-μ)+θ3(xt-3-μ)+.

We can directly verify from this expression that et-θet-1=xt-μ.

The fact that the unobserved series et can be ‘recovered’ from the observed series xt is the reason for choosing |θ|<1. The general result is:

3.1.6 Definition. The MA(q) model of 3.1.5 is said to be invertible if it is possible to express et in terms of the present and past values of xt as

et = (xt-μ)-π1(xt-1-μ)-π2(xt-2-μ)-
= (1-π1B-π2B2-)(xt-μ)
= π(B)(xt-μ).

Remark. The signs in this last equation are chosen for convenience on rewriting the expansion as
3.1.7

xt=μ+π1(xt-1-μ)+π2(xt-2-μ)++et

- a form of the model which will be useful for prediction.

3.1.8 Definition. The invertibility condition for model 3.1.5 is that the operator θ(B) satisfies

θ(B)0 for |B|1.

Here, we are treating B as a real or complex number rather than an operator.

Remark. Consider a MA(2) model where we observe {xt} satisfying xt=et-0.7et-1-0.6et-2. The roots of θ(B)=1-0.7B-0.6B2=0 are given by B=5/6 and B=-2 so that θ(B) is not invertible.

The invertibility condition means that it is possible to expand

θ(B)-1=π(B)=1-π1B-π2B2-

with πj0 in order to derive 3.1.6 from 3.1.5.

One way to derive the coefficient {πj} of 1/θ(B) is to use partial fractions which is standard in power-series expansion.

3.2 Selection of moving average models.

We now consider when it is appropriate to use a moving average model to represent a given series. This is based on the autocorrelation properties of the model:

3.2.1 Recall that ψj=-θj so that ψ0=1, ψ1=-θ1, ψq=-θq. The autocovariances of a moving average model are given in terms of the coefficients ψj as

γx,k=σe2(ψ0ψk++ψq-kψq)=σe2(i=0q-kψiψk+i) for k=0q.

The Characteristic property of the model is that
3.2.2

ρk=0 for k>q.

Continuing the previous example, taking σe2=10, we get

σx2=γx,0=10(12+(-.7)2+(-.6)2)=18.5γx,1=10(1×(-.7)+(-.7)×(-.6))=-2.8γx,2=10(1×(-.6))=-6.0

so that ρ1=-2.8/18.5=-0.15 and ρ2=-6.0/18.5=-0.32.

Proof.

of 3.2.1. Start from

Cov(xt,xt+k)=Cov(i=0qψiet-i,j=0qψjet+k-j).

Now each term et-i in the first sum is correlated with at most one term in the second sum, i.e. the same term which is et+k-j with j=k+i. So the covariance reduces to

i=0q-kCov(ψiet-i,ψk+iet-i)=σe2(i=0q-kψiψk+i).

Notice that the above formula holds even if q=. In other words, if

xt=j=0ψjet-j,

where {et} is white noise, then

γx,k=σe2(j=0ψjψj+k),ρx,k=j=0ψjψj+kj=0ψj2.

There is a converse result, that any stationary time series xt having the property 3.2.2 may be represented by a moving average model 3.1.2. satisfying the invertibility condition.

The characteristic property is recognised by inspection of the sample acf of the data series. The sampling properties of this acf are therefore required, and these will be given shortly. But first have a look at two time series for which Moving Average models appear to be appropriate. In fact it is the successive changes, or first differences, of these series which appear to have the characteristics of Moving Average models.

Consider the ‘CABLE’ data, available on Moodle, which is clearly non-stationary. A good model for this data might be of the form yt=xte(a+bt) where {xt} is stationary. We therefore take the logarithm and then the first differences of the series of weekly business transactions as shown in Figure 6 to make it stationary.

Figure 6: First Link, Second Link, Caption: The number of weekly business transactions, and their first difference.

Figure 7 shows the sample time series properties of the differenced series, which are typical of an MA(1) model:

Figure 7: Link, Caption: Sample properties of the differences of the weekly transactions.

The model xt=μ+et-0.713et-1 is estimated for this series, and it has the model properties shown in the following figures.

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

The second plot shows the implied ψk weights of the MA representation of 3.1.3, and the third plot shows the implied πk weights of the MA representation given in 3.1.7. The fourth plot shows the implied model spectrum which we shall define in Section 3.12.

The second example is of the annual changes in daylength, measured in milliseconds, from the dataset ‘DAYLENGT’ available in Moodle. Part of what we see is almost certainly due to the fact that the original data has been smoothed to give the series shown in the left panel of Figure 9. Each point has been replaced by its average with the previous and next point. And this procedure has then been repeated. This procedure has the property of completely annihilating any cycle of period 3.

Figure 9: First Link, Second Link, Caption: Series of annual changes in daylength and their first differences.

Figure 10 shows the sample time series properties of the differenced series. Among these, the sample spectrum is very low at frequency 1/3 as a consequence of the smoothing. The sample autocorrelations are characteristic of an MA(3) model:

Figure 10: Link, Caption: Sample statistics of the differences in the annual daylength series.

The MA(3) model xt=μ+et+0.89et-1+0.87et-2+0.23et-3 is estimated for this series, and it has the model properties shown in the following figures, which match the sample properties very well:

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

In general the sample value rx,k is an approximately unbiased and consistent estimate of ρx,k for any fixed lag k as the sample size n tends to infinity. In the particular case of the MA(q) model there is a useful result for the standard error of the sample acf for lags greater than the order. This helps us to weigh the statistical evidence for a hypothesised value of q.

3.2.3 For a Gaussian time series of length n with acf satisfying ρk=0 for k>q

SE(rk){1n[1+2j=1qρj2]}12 for k>q.

The simplest case, q=0, gives the property under the hypothesis of white noise, that

SE(rk)1n

so it is usual to plot the sample correlations on a graph with approximate 95% error limits drawn at ±2/n about the horizontal axis. If the sample acf values generally lie within these limits with the only values outside them being at unremarkable lags then white noise is a sensible conclusion. But these only apply for assessing the hypothesis of white noise. Under any other hypothesis, the limits are wider.

The sample acf may in fact lie outside these limits for many lags. We are here interested in the case when a small number of values lie clearly outside the limits at low lags. We then have good reason to suppose that a MA(q) model is appropriate with the value of q being tentatively chosen as the ‘cut-off’ lag - the highest lag with a ‘significant’ acf value.

We need then to be aware of the fact that the sampling variability of the remaining acf values, at lags greater than the assumed MA order q will be increased according to 3.2.3, so that rather more values will lie outside the white noise limits. The increased SE may be estimated from 3.2.3 using the sample acf values rj up to lag q in place of the unknown ρj.

Another visual characteristic of the sample acf at these higher lags is that their statistical variablity may follow a smoother pattern than for white noise. Even when the true model is MA(q) these patterns may be deceptively suggestive of structure in the acf, i.e. non zero values of ρk, which is not actually present at these higher lags.

Estimation of the parameters in a MA(q) model may sometimes be achieved by solving the equations 3.2.1 for the unknown parameters σe2 and ψj=-θj,j=1q using sample autocovariance values Cx,k in place of γk. Unfortunately these equations may not have a real solution, though when they do it is easily found. For example in the MA(1) case the solution for θ is found by solving

rx,1=-θ1+θ2

which has no real solution if rx,1 is outside the range [-12,12], and this may well happen through statistical variability in rx,1.

3.3 Autoregressive models.

Consider the model
3.3.1

xt=μx+ϕ(xt-1-μx)+et

which defines successive values of the series xt in terms of previous values and a white noise series et. This is the first order autoregessive model, commonly referred to as the AR(1) model. The essential distinction with prediction models is that for the prediction equation et is only required to be uncorrelated with xt-1 - in general it will not be white noise. For the AR(1) model however, et is assumed to be white noise and will, as a result of that, be independent of xt-1.

The properties of this model may be found by successively substituting

xt-μx=et+ϕ(xt-1-μx)=et+ϕ(et-1+ϕ(xt-2-μx))=
=et+ϕet-1+ϕ2et-2++ϕLet-L+ϕL+1(xt-L-1-μx)

Now assuming that both |ϕ|<1 and xt is bounded ‘in the infinite past’ we can let L and obtain the convergent sum:
3.3.2

xt=μx+et+ϕet-1+ϕ2et-2+=μx+i=0ϕiet-i.

We can derive from this, for example, E(xt)=μx and

σx2=Var(xt)=σe2+ϕ2σe2+ϕ4σe2+=σe21-ϕ2.

The representation 3.3.2 could in fact be obtained using the backward shift operator notation by writing 3.3.1 as (1-ϕB)(xt-μx)=et so that

xt-μx=(1-ϕB)-1et=(1+ϕB+ϕ2B2+)et.

The variance and correlation properties can be derived directly from the model 3.3.1 without using this expansion. Because xt-1 and et are uncorrelated we get

σx2=Var(xt)=Var(ϕxt-1+et)=ϕ2σx2+σe2(1-ϕ2)σx2=σe2.

Similarly et is uncorrelated with xt-k for any k>0 so

γx,k = Cov(xt,xt-k)
= Cov(ϕxt-1+et,xt-k)
= Cov(ϕxt-1,xt-k)
= ϕCov(xt-1,xt-k)
= ϕγx,k-1.

It follows that

γx,1=ϕσx2,γx,2=ϕγx,1=ϕ2σx2,,γx,k=ϕkσx2.

Note therefore that ρx,1=ϕ and ρx,k=ϕk for k>0.

3.3.3 Definition. A time series follows an autoregressive model of order p - AR(p), if it is generated by the process:

xt=μx+ϕ1(xt-1-μx)+ϕ2(xt-2-μx)++ϕp(xt-p-μx)+et

where et is a white noise series. Using operator notation the model may be written:
3.3.4

(1-ϕ1B-ϕ2B2--ϕpBp)(xt-μx)=ϕ(B)(xt-μx)=et.

We also require:
3.3.5 Definition. The stationarity condition for model 3.3.3 is that the operator ϕ(B) satisfies

ϕ(B)0 for |B|1.

Remark. The condition is the same as that required of θ(B) for a moving average model to be invertible (see 3.1.8). In this case it means that it is possible to carry out the convergent expansion following from 3.3.4:

ϕ(B)(xt-μx) = et
xt-μx = ϕ(B)-1et
= (1+ψ1B+ψ2B2+)et
xt = μx+i=0ψiet-i

where ψj0. This is an infinite moving average representation of the model (compare with the finite represenation of 3.1.3 for MA processes). We note one consequence which may be obvious from the way the series is generated, but which we can formally verify with this expansion; the fact that et is uncorrelated with all past values of the series:
3.3.6

Cov(et,xt-i)=0 for i1.

This is because xt-i=μx+et-i+ψ1et-i-1+ in which all terms are uncorrelated with et.

3.4 Properties of autoregressive models.

The main contrast with moving average models is that the acf does not have a cut off, but generally decays towards higher lags. The pattern of decay is obviously reflected in the behaviour of the time series generated from this model, and includes cyclical patterns. Historically, this was a reason for the introduction of these models.

3.4.1 The Yule - Walker equations relating the coefficients and acf of the AR(p) model.

The coefficients ϕ1ϕp determine, and are determined by, the acf values ρx,1ρx,p, by the equations

ρx,i=ϕ1ρx,i-1++ϕpρx,i-p for i=1p.

Displayed as a set of linear equations for ϕ1ϕp these are

(1ρ1ρ2ρp-1ρ11ρ1ρp-2ρ2ρ11ρ1ρp-1ρp-2ρ11)(ϕ1ϕ2ϕp)=(ρ1ρ2ρp)

where for simplicity the subscript x of ρx,i has been omitted. We estimate the parameters ϕ1,ϕp by replacing the acf {ρk} by sample acf {rk}.

Proof.

We demonstrate the more general result:
3.4.2

ρx,i=ϕ1ρx,i-1++ϕpρx,i-p for all i1.

The Yule-Walker equations are just the subset of these for i=1p.

From 3.3.6 and for i1

0 = Cov(et,xt-i)
= Cov(xt-ϕ1xt-1-ϕpxt-p,xt-i)
= γx,i-ϕ1γx,i-1--ϕpγx,i-p
 γx,i = ϕ1γx,i-1++ϕpγx,i-p.

Divide through by σx2=γx,0 for the required result. ∎

These equations are completed by one further useful relationship:
3.4.3

σe2σx2=(1-ϕ1ρx,1-ϕ2ρx,2--ϕpρx,p)
Proof.

In two steps. The first is that

σe2=Cov(xt-ϕ1xt-1--ϕpxt-p,et)=Cov(xt,et)

because et is uncorrelated with all past xt-i. Extending this;

σe2 = Cov(xt,et)
= Cov(xt,xt-ϕ1xt-1--ϕpxt-p)
= γx,0-ϕ1γx,1--ϕpγx,p.

Divide by σx2 for the result. ∎

Figure 12 shows a good example of a time series that resembles a first order autoregressive process. It is a series of temperature measurements modified by wind speed and other variables for use in predicting energy demands. The sample properties, also shown in Figure 12, are typical of this process.

Figure 12: Link, Caption: Series of daily temperature indicator.

The model fitted to this series was xt=μx+0.77(xt-1-μx)+et. The properties of the model, presented in the next figures, correspond very well to the sample properties of the series.

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

3.4.4 Example. The AR(2) model:

xt=μx+ϕ1(xt-1-μx)+ϕ2(xt-2-μx)+et.

This process has the capacity to represent a series with an irregular cycle and was put forward by Yule to model the sunspot cycle. The motivation was a discrete time version of the second order differential equation for a damped pendulum, including random fluctuations. For p=2 the Yule-Walker equations give:
3.4.5

ϕ1+ϕ2ρx,1=ρx,1ρx,1ϕ1+ϕ2=ρx,2}ρx,1=ϕ1/(1-ϕ2)ρx,2=ϕ12/(1-ϕ2)+ϕ2}.

If the model parameters ϕ1,ϕ2 are given, these equations supply ρx,1,ρx,2. Then 3.4.2 generates successively ρx,3,ρx,4 using ρx,i=ϕ1ρx,i-1+ϕ2ρx,i-2.

Figure 14 shows a series which appears to follow an AR(2) model. It is an indicator of past ocean surface temperatures, obtained by measuring oxygen isotope ratios in a core from the ocean sediment. The time scale is in thousands of years. The series has an irregular cyclical appearance, reflected in its sample autocorrelations.

Figure 14: Link, Caption: Indicator of past ocean surface temperatures.

The coefficients of the fitted AR(2) model were found to be ϕ1=1.09 and ϕ2=-0.42. The next five figures show the properties of the fitted model, which are similar to those of the sample properties.

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

3.5 Selection of autoregressive models.

We began the previous section by saying that the autocorrelation properties of autoregressive models are generally quite distinct from those of moving average models. However we also have

3.5.1 The characteristic property of the AR(p) model is that its partial autocorrelation function has a ‘cut-off’ at lag p, i.e.

ρ|x,k=0 for k>p.

This is because conditioning xt upon xt-1,xt-2,,xt-k+1 for any k>p includes xt-1,xt-2,,xt-p. If p lagged terms have been included in the predictor, the error et is white noise and uncorrelated with all terms xt-k at higher lags than p. Including these further terms cannot improve the predictor; this implies that the partial correlation at these lags is zero.

Again there is a converse result, that any stationary time series satisfying the property 3.5.1 may be represented by the AR(p) model 3.3.4 with et being a white noise series, and ϕ(B) satisfying the stationarity condition.

The characteristic property is recognised by inspection of the sample pacf of the data series. The sampling properties of r|x,k are that they are approximately unbiased and consistent estimates of ρ|x,k for any fixed lag as the sample size n tends to infinity. When the series xt follows the AR(p) model the following sampling property holds:

3.5.2

SE(r|x,k)1n for k>p.

It is therefore usual to plot the sample pacf on a graph with approximate 95% error limits for zero pacf values of ±2/n about the horizontal axis. We are interested in the case when a small number of values lie clearly outside the limits at low lags. The highest lag with a significant pacf value then indicates the order p.

Remark. Unlike the acf which may extend to high lags before decaying effectively to zero, the pacf for most practically occurring series does not extend to very high lags. This is because

Var(xt|xt-1,xt-2,xt-k)=Var(xt)(1-ρ|x,12)(1-ρ|x,22)(1-ρ|x,k2)

measures the ability to predict the series from past values, which is usually rather limited. The first few pacf values usually ‘take up’ most of this. So in practice series which do not in actual fact follow an AR model may be well approximated by one of modest order. The true pacf may not actually ‘cut-off’ at some lag, but may just become small compared with the SE of the sample value.

3.6 Autoregressive Moving average models.

3.6.1 Definition. A time series follows an autoregressive moving average model of orders p and q, an ARMA(p,q) model, if it is generated by the process

xt=μx+ϕ1(xt-1-μx)+ϕ2(xt-2-μx)++ϕp(xt-p-μx)+et-θ1et-1--θqet-q

where et is a white noise series. Using operator notation the model may be written

ϕ(B)(xt-μx)=θ(B)et

where the operators ϕ(B) and θ(B) are as defined for the pure AR and MA models.

We again require conditions

  • for stationarity ϕ(B)0 for |B|1

  • for invertibility θ(B)0 for |B|1.

The first ensures that we can express xt as an infinite moving average form:
3.6.2

xt=μx+ϕ(B)-1θ(B)et=μx+ψ(B)et=et+ψ1et-1+ψ2et-2+

and the second that we can recover et from xt by

et=θ(B)-1ϕ(B)(xt-μx)=π(B)(xt-μx)=(xt-μx)-π1(xt-1-μx)-π2(xt-2-μx)-

which is simply rearranged to give the infinite autoregressive form
3.6.3

xt=μx+π1(xt-1-μx)+π2(xt-2-μx)++et.

Remark. The existence of these infinite forms means that in theory an ARMA process can be approximated with arbitrary accuracy by either a pure MA or AR model of finite order. Given a finite data series this can lead to some uncertainty as to which model to use.

3.6.4 Example. The ARMA(1,1) model as defined in 3.6.3 has

ψ(B)=1-θB1-ϕB=(1-θB)(1+ϕB+ϕ2B2+)=1+(ϕ-θ)B+(ϕ-θ)ϕB2+

so that ψi=(ϕ-θ)ϕi-1. Similarly

π(B)=1-ϕB1-θB=(1-ϕB)(1+θB+θ2B2+)=1-(ϕ-θ)B-(ϕ-θ)θB2+

so that πi=(ϕ-θ)θi-1.

Remark. Note that in this example if ϕ=θ then simply xt=μx+et, because (1-ϕB) and (1-θB) cancel. In general we require that ϕ(B) and θ(B) have no cancelling factors, else the model would not be distinguishable from a simpler one. The problem can come with model estimation from finite data series when near-cancelling factors can occur. The signal plus noise model can give this situation if the noise almost swamps the signal.

Figure 16 in the next page shows a series of central England average temperatures, that plausibly follows such a model. The coefficients of the fitted ARMA(1,1) model were found to be ϕ=0.849 and θ=0.655.

Figure 16: Link, Caption: Series of annual central England average temperatures and its sample statistics.

The properties of the fitted model, shown in the next six figures, are similar to those of the sample properties. The last plot shows a smoothed pattern of the series, estimating the underlying level which might be represented by an AR(1) process.

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

3.7 Selection of ARMA models.

The first point to make is that in general neither of the characteristic properties of the MA or AR model holds for ARMA models. This is a negative statement but nevertheless of value in suggesting that a mixed ARMA model is needed.

A second point is that even if it appears that a pure AR or MA model is appropriate from recognition of one characteristic property or the other, a model can sometimes be improved by adding a MA term to a pure AR model or vice versa. This is because inspection of the sample acf and pacf, though providing an indication of the basic model, may ‘lose’ other indications in the sampling fluctuations.

The main concern though is that there may be no simple guide as to the choice of the model orders which are needed in order to estimate the model parameters. The acf properties of the model are considered first. These may be derived from the infinite moving average form 3.6.5 by letting q= in 3.2.1:
3.7.1

γx,k=σe2(ψ0ψk+ψ1ψk+1+)=σe2(i=0ψiψk+i) for k>0.

These infinite sums can be avoided by solving appropriate equations similar to the Yule-Walker equations, although the infinite moving average is initially useful for deriving:
3.7.2

Cov(et,xt-i)=0 for i>0,Cov(xt,et)=σe2,Cov(xt,et-i)=ψiσe2 for i>0.

3.7.3 Example. The ARMA(1,1) model has variance and autocorrelations given by

σx2=σe2(1-2θϕ+θ2)(1-ϕ2),ρx,k=(ϕ-θ)(1-θϕ)(1-2θϕ+θ2)ϕk-1 for k>0.

Note that the pattern of autocorrelations is similar to that for the AR(1) in the sense that both exhibit geometric decay of the autocorrelation as a function of the lag k.

Proof. We use Cov(xt,et-1)=ψ1σe2=(ϕ-θ)σe2 to derive the first of the equations:

γx,0=Cov(xt,xt)=Cov(ϕxt-1+et-θet-1,xt)=ϕγx,1+σe2-θ(ϕ-θ)σe2γx,1=Cov(xt,xt-1)=Cov(ϕxt-1+et-θet-1,xt-1)=ϕγx,0+0-θσe2γx,k=Cov(xt,xt-k)=Cov(ϕxt-1+et-θet-1,xt-k)=ϕγx,k-1+0-0 for k>1.

The first two can be solved for σx2 and γx,1 as stated, and the third shows that γx,k=ϕk-1γx,1.

For the general ARMA model the decay pattern of the acf is similar to that for the AR model with the same autoregressive part, but it is modified by the presence of the moving average part to allow greater flexibility of form. The pattern is determined by the equation relating successive acf values for the general ARMA model:
3.7.4

ρx,k=ϕ1ρx,k-1++ϕpρk-p for k>q

which derives directly from the model:

γx,k=Cov(xt,xt-k)=Cov(ϕ1xt-1+ϕpxt-p+et-θ1et-1--θqet-q,xt-k)=ϕ1γx,k-1++ϕpγx,k-p+0-0--0 for k>q.

This pattern of decay can help to indicate the appropriate AR order p. For example cyclical delay indicates that p should be at least 2. The modification at low lags to the smooth pattern of decay associated with the pure AR model can also help to indicate the MA order q, as the ARMA(1,1) example illustrates. For the ARMA(1,2) model the geometric decay of the acf starts from lag 2.

The other statistical tools available for selecting the orders of an ARMA model are the sample pacf and sample spectrum. The sample pacf is not of great value for either the MA or ARMA model, but its pattern is generally similar to that of the πi coefficients which appear in the infinite AR representation of these models. For the ARMA(1,1) example this pattern is approximately a geometric decay by the factor θ from the lag 1 value of the pacf.

The remaining sections in this Chapter (3.8–3.12) are not examinable but will be needed for your project. We will discuss these sections after covering Chapters 4–5.

3.8 The sinusoidal or harmonic regression model.

A suitable model for a smooth cycle is
3.8.1

yt=c+Rcos2π(ft+g)+et

where

  • The Amplitude of the cycle is R>0

  • The Frequency in cycles per unit time (that is over t[0,1]) is f where 0<f<0.5.

  • The Phase, that is the fraction of a cycle by which it lags, is g, 0g<1.

We restrict R0, f0 and 0g<1 .

We shall sometimes write Rcos2π(ft+g)=Rcos(ωt+ϕ) using the angular frequency ω=2πf of radians per unit time and angular phase ϕ=2πg - the notation is more convenient at times.

Rather than estimate R and ϕ directly we use

cos(ωt+ϕ)=cosωtcosϕ-sinωtsinϕ

to rewrite the model as
3.8.2

yt=c+Acosωt+Bsinωt+et

where A=Rcosϕ and B=-Rsinϕ. So if ω is known this is a linear regression model with coefficients A and B.

Sometimes the fit to a smooth cycle is improved by adding further cycles known as harmonics because their frequencies are multiples of the fundamental frequency ω, e.g.

yt=c+A1cosωt+B1sinωt+A2cos2ωt+B2sin2ωt+A3cos3ωt+B3sin3ωt+et

Thus cycles which are asymmetric, with a slow rise and rapid fall, or sharp peaks with flat troughs, may be well fitted by including sufficient harmonics provided the pattern repeats itself periodically.

3.9 Harmonic regression for seasonal models.

We have previously used the seasonal factor model α1M1,t++α12M12,t to represent a periodic component of a monthly series based on a nonsmooth model. An equivalent smooth model is one which treats this component in terms of cycles of period 12 and uses instead as regression variables the set of 12 sinusoids defined for t=1n by
3.9.1

c0,t=1
cj,t=cos2πfjtsj,t=sin2πfjt} for j=15
c6,t=(-1)t

where fj=j/12 is the frequency of the jth harmonic of the fundamental frequency 1/12. For a more general period s the pairs of cosine and sine regressors are present for j=1(12s-1) when s is even. When s is odd there are only j=112(s-1) such terms. The final, alternating, term c12s,t is only present when s is even. Note that 0fj1/2.

The fit to the data using these regressors will be exactly the same as that given by the seasonal factor. The relative advantages arise when it may be possible to obtain a good fit using fewer than the full set of regressors. This can happen if the seasonal effect is confined to just a few months in the year for the factor model, and if the seasonal cycle is very smooth in the case of the harmonic regression. Figure 18 shows the results of fitting the model to the CO2 series, but using just 2 pairs of harmonics. The residual mean square of 0.1496 was slightly lower than the value of 0.1541 obtained using the monthly factor model.

Figure 18: First Link, Second Link, Caption: Fit to Monthly CO2 of a model with trend and two pairs of harmonic terms.

3.10 The Nyquist frequency.

3.10.1 Lemma

Any cycle Rcos2π(ft+g) of frequency f>12 when sampled at integer time points t is indistinguishable from a cycle with frequency lying in the range 0f12.

Proof. Any frequency f can be expressed as f=K±f where K is an integer and f is unique in the interval 0f12. Then for integer t

Rcos2π([K±f]t+g)=Rcos2π(±ft+g)=Rcos2π(ftg)

The frequency f is called an alias of f.

This Lemma leads to the
3.10.2 Convention. Assume that for a discrete regularly sampled time series the frequency of any cyclical component lies in the range 0f12 cycles per sampling interval.

We usually abbreviate this just to 0f12. The frequency of 12cycle per sampling interval is called the Nyquist frequency. Of course the assumption may be wrong - see Michael Faraday’s Royal Society article ‘On a curious class of optical deceptions’. But the convention does ensure that if the sampling interval is sufficiently short the correct interpretation will be made. The possibility that, under this convention, the assumed frequency is an alias of a higher frequency, is a consideration in the choice of sampling interval.

This Lemma explains why no more than six harmonics are used to describe the seasonal pattern, with the highest frequency being f6=12.

3.11 The periodogram

This is a tool for investigating whether a time series contains cycles, and for finding their amplitude and frequency.

3.11.1Definition. The periodogram of a time series y1,,yn is defined for 0f12 by

I(f)=2n{(t=1nytcos2πft)2+(t=1nytsin2πft)2}

Remark. This is motivated by the problem of estimating R and f in the regression

yt=Rcos2π(ft+g)+et=Acos2πft+Bsin2πft+et.

The least squares equations for A^ and B^ may be approximated, provided f0 or 12, as

(1n(cos2πft)21ncos2πftsin2πft1nsin2πftcos2πft1n(sin2πft)2)(A^B^)(12n0012n)(A^B^)=(1nytcos2πft1nytsin2πft)

so that

A^2n1nytcos2πft,B^2n1nytsin2πft

and

I(f)12n(A^2+B^2)=12nR^2

3.11.2 Interpretation of the periodogram. There are efficient methods for calculating the periodogram over a fine grid of frequencies spanning 0f12. The scaling factor in the definition is chosen so that the area under the graph of I(f) is the mean square value of the series:

f=012I(f)𝑑f=1nt=1nyt2

Usually the series is mean corrected before calculating the periodogram so this becomes the sample variance.

A single cycle of amplitude R and frequency f will have a periodogram which has a peak at frequency f with height close to 12nR2 and width 1n. The main peak will be neighboured by smaller peaks of diminishing height, called side-lobes, at frequencies of f±1n,f±2n .

If the series consists of several such cycles then provided their frequencies are separated by at least 2n, i.e. they differ by at least one cycle in the sample length, then each cycle will usually give rise to a distinct peak at its own frequency. However, if the amplitude of one cycle is much less than another the frequency separation will have to be greater to avoid a sidelobe of the larger peak masking the smaller peak. Figure 19 shows a series consisting of two cycles added together, and their periodogram.

Figure 19: First Link, Second Link, Caption: The sum of two cycles and its periodogram.

The periodogram of an irregular cycle with varying amplitude and frequency will typically appear as a collection of peaks over a broader frequency range.

A trend in a time series will appear as a peak at frequency zero, often much larger than any other peak, so it is wise to remove any strong trend by regression before examining the periodogram. The plots in the next four figures show (A) the periodogram for the raw (mean corrected) CO2 series, (B) the periodogram after trend correction - now the seasonal cycles are more noticeable, (C) the periodogram of the residuals after fitting the trend and the cycle with period 12 - the next harmonic is now clearer. Finally, (D) shows the periodogram of the residuals after fitting the trend and the harmonics at periods 12 and 6. What is seen is the typical periodogram of a correlated series - but we can also notice some evidence for a harmonic at period 4.

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link

In general the periodogram should be viewed as a transformation of the data. It has particular value for detecting regular cycles, but is also useful for understanding the irregular or random features. An important case is when the series is white noise. Then whatever frequency 0f12, using properties of sine and cosine functions, the estimates of the above regression will have the property

A^ and B^ are independent Normal with means 0 and variance 2nσe2

so that

I(f)=n2(A^2+B^2)σe2χ22=Exponential(2σe2)

where here the parameter θ is the mean of the exponential distribution with pdf f(x)=θ-1exp(x/θ).

Moreover the values at different frequencies are close to being independent random variables provided the frequencies are separated by at least 1n. Because the central 90% of this distribution ranges from about one twentieth to three times its mean, the general picture of the periodogram in this case is very variable with many ‘peaks’. These are just due to natural statistical fluctuation but are easily misinterpreted as revealing regular cycles in the data. In a sufficiently large data set the peak of height 12nR2 of a regular cycle would stand clear of these fluctuations due to noise which have mean 2σe2. Figure 21 illustrates these points with an artificial series of random Gaussian noise, and its periodogram.

Figure 21: First Link, Second Link, Caption: A random Gaussian series and its periodogram.

3.12 The spectrum of a stationary time series.

3.12.1 Introduction. The periodogram defined in 3.11.1, for a (mean corrected) time series sample x1,x2,,xn (using now xt rather than yt) may be shown to be related to the sample acf by

I(f)=2sx2{1+2k=1n-1rkcos2πkf}=
2sx2{1+2[r1cos2πf+r2cos2π2f+r3cos2π3f++rn-1cos2π(n-1)f]}.

Now assuming that xt is a stationary time series the following definition is motivated by replacing the sample acf rk with ρk:
3.12.2 The spectrum of xt is

S(f)=2σx2{1+2k=1ρkcos2πkf}=
2σx2{1+2[ρ1cos2πf+ρ2cos2π2f+ρ3cos2π3f+]}.

We shall call the periodogram the sample spectrum and also write I(f)=S(f) when it is believed that the data arise from a stationary time series. The periodogram by its very definition is non-negative, and the same property may be proved for the spectrum
3.12.3

S(f)0 for 0f12.

The simplest example is that of white noise for which the spectrum has the constant, or uniform, value 2σx2. In general the spectrum may be thought of as describing how the variance of the series is spread over the frequency range, so that corresponding to 3.10.2 we have

f=012S(f)𝑑f=σx2.

The next simplest example is of a series for which ρ1=ρ0 but ρk=0 for k>1, so that

S(f)=2σ2{1+2ρcos2πf}.

For this example the extreme values of S(f), found at the endpoints f=0 and f=12 of the range, are 1+2ρ and 1-2ρ. For the spectrum to be positive both these must be positive so that we obtain the constraint |ρ|12. In general the spectrum can have any positive values over its range.

One expression for the spectrum of an MA(q) model follows from the characteristic property, by stopping the series 2.4.2 after q terms:
3.12.4

S(f) = 2σx2{1+2k=1qρkcos2πkf}
= 2σx2{1+2[ρ1cos2πf+ρ2cos2π2f++ρqcos2πqf]}.

Now the range of frequency 0f12 corresponds monotonically to the range -1c1 for c=-cos2πf. It is useful to think of plotting the spectrum against c; it is only a slight distortion of the horizontal axis. In that case cos2πkf is a polynomial of degree k in c, so that for the MA(q) model the spectrum can be considered as a polynomial of degree q. Any such polynomial is a valid spectrum provided it is non-negative.

The statistical properties of the sample spectrum for a stationary time series generalise those stated earlier for white noise. They are that for large n
3.12.5

S(f)Exponential{S(f)}

with values at frequencies separated by more than 1n being approximately independent. The sample spectra that were shown in Figures 4 and 5 can be smoothed, as shown in Figure 22, to estimate the true spectra about which the sample spectra vary with an exponential distribution.

Figure 22: First Link, Second Link, Caption: Smoothed sample spectrum of the Monthly CO2 residuals and the random Gaussian series.

For selection of ARMA models, the sample spectrum is of most value in confirming AR features of the model by the presence of spectrum peaks. For example a peak at f=0 indicates at least a first order model with p=1. If in addition there is a peak within the range 0<f<12 then another two AR terms are needed giving p=3. The general form of the spectrum for an ARMA(p,q) model is that of the ratio of two polynomials in c=-cos2πf. The numerator of order q depends on the MA part, and the denominator of order p on the AR part. Rational polynomials such as these have great flexibility in approximating continuous functions and this is one reason why the ARMA model is able to fit observed stationary time series properties so well.

The following three figures show the plot for ocean core data, and properties of a fitted AR(6) model.

Unnumbered Figure: Link

Unnumbered Figure: Link

Unnumbered Figure: Link