2 Descriptive statistics for stationary time series.4 Linear models for non-stationary and seasonal time series.

3 Linear models for stationary time series.

In this chapter we show how autocorrelation patterns can arise from models based on white noise. These models can readily be fitted to times series data, and used for a variety of applications, particularly forecasting.

3.1 Moving average models.

3.1.1 Example: The MA( $1$ ) model. Let

x_{t}=\mu+e_{t}-\theta e_{t-1}

where $\mu=\mu_{x}$ and $e_{t}$ is white noise. The reason for the $-$ sign will beome evident.

The properties of this model are

\mbox{E}(x_{t})=\mu\mbox{\ and\ Var}(x_{t})=(1+\theta^{2})\sigma_{e}^{2}

\rho_{x,1}=-\frac{\theta}{1+\theta^{2}}\mbox{\ and\ }\rho_{x,k}=0\mbox{\ for\ % }k>1.

Proof as an exercise.

Remarks. $|\rho_{x,1}|\leq{\scriptscriptstyle\frac{1}{2}}$ , follows from $(1\pm\theta)^{2}\geq 0\Rightarrow(1+\theta^{2})\geq\mp 2\theta$ . Also

\frac{\theta}{1+\theta^{2}}=\frac{\theta^{-1}}{1+(\theta^{-1})^{2}}

so that replacing $\theta$ by $\theta^{-1}$ would result in the same acf. Provided $|\theta|\neq 1$ we therefore have a choice of values for $\theta$ when modelling an acf of this form. The convention is to choose the value that satisfies $|\theta|<1$ .

3.1.2 Definition. A time series $x_{t}$ follows a moving average model of order $q$ - MA( $q$ ) if it may be represented as

x_{t}=\mu+e_{t}-\theta_{1}e_{t-1}-\cdots-\theta_{q}e_{t-q}

where $e_{t}$ is a white noise series.

Remarks. The model has $q$ moving average parameters $\theta_{j}$ and one variance parameter $\sigma_{e}^{2}$ . For some purposes it is useful to have a slightly different notation and to write
3.1.3

x_{t}=\mu+\psi_{0}e_{t}+\psi_{1}e_{t-1}+\cdots+\psi_{q}e_{t-q}

where $\psi_{0}=1$ and $\psi_{j}=-\theta_{j}$ for $j=1\ldots q$ . The name moving average is more usually applied when one observed time series takes the place of $e_{t}$ in this equation, and a new series, taking the place of $x_{t}$ , is constructed according to this equation. Our model contrasts with this in that $x_{t}$ is observed, and we are proposing a model which gives rise to its autocorrelation pattern.

This is a convenient point to introduce an operator notation for discrete time series.

3.1.4 Definition. The backward shift operator $B$ applied to a series $e_{t}$ replaces it by the shifted series with values $f_{t}=e_{t-1}$ . Powers of $B$ are similarly defined, i.e.

Be_{t}=e_{t-1}\mbox{\ and\ }B^{m}e_{t}=e_{t-m}.

Using this notation the MA( $q$ ) model may be expressed from Definition 3.1.2 as
3.1.5

$\displaystyle x_{t}$	$\displaystyle=$	$\displaystyle\mu+e_{t}-\theta_{1}Be_{t}-\theta_{2}B^{2}e_{t}-\cdots-\theta_{q}% B^{q}e_{t}$
	$\displaystyle=$	$\displaystyle\mu+(1-\theta_{1}B-\theta_{2}B-\cdots-\theta_{q}B^{q})e_{t}$
	$\displaystyle=$	$\displaystyle\mu+\theta(B)e_{t}$

For example the MA( $1$ ) model is

x_{t}=\mu+(1-\theta B)e_{t}.

Now we can formally re-arrange this, and provided $|\theta|<1$ expand:

e_{t}=(1-\theta B)^{-1}(x_{t}-\mu)=(1+\theta B+\theta^{2}B^{2}+\theta^{3}B^{3}% +\cdots)(x_{t}-\mu)=

(x_{t}-\mu)+\theta(x_{t-1}-\mu)+\theta^{2}(x_{t-2}-\mu)+\theta^{3}(x_{t-3}-\mu% )+\cdots.

We can directly verify from this expression that $e_{t}-\theta e_{t-1}=x_{t}-\mu$ .

The fact that the unobserved series $e_{t}$ can be ‘recovered’ from the observed series $x_{t}$ is the reason for choosing $|\theta|<1$ . The general result is:

3.1.6 Definition. The MA( $q$ ) model of 3.1.5 is said to be invertible if it is possible to express $e_{t}$ in terms of the present and past values of $x_{t}$ as

$\displaystyle e_{t}$	$\displaystyle=$	$\displaystyle(x_{t}-\mu)-\pi_{1}(x_{t-1}-\mu)-\pi_{2}(x_{t-2}-\mu)-\cdots$
	$\displaystyle=$	$\displaystyle(1-\pi_{1}B-\pi_{2}B^{2}-\cdots)(x_{t}-\mu)$
	$\displaystyle=$	$\displaystyle\pi(B)(x_{t}-\mu).$

Remark. The signs in this last equation are chosen for convenience on rewriting the expansion as
3.1.7

x_{t}=\mu+\pi_{1}(x_{t-1}-\mu)+\pi_{2}(x_{t-2}-\mu)+\cdots+e_{t}

- a form of the model which will be useful for prediction.

3.1.8 Definition. The invertibility condition for model 3.1.5 is that the operator $\theta(B)$ satisfies

\theta(B)\neq 0\mbox{\ for \ }|B|\leq 1.

Here, we are treating $B$ as a real or complex number rather than an operator.

Remark. Consider a MA(2) model where we observe $\{x_{t}\}$ satisfying $x_{t}=e_{t}-0.7e_{t-1}-0.6e_{t-2}$ . The roots of $\theta(B)=1-0.7B-0.6B^{2}=0$ are given by $B=5/6$ and $B=-2$ so that $\theta(B)$ is not invertible.

The invertibility condition means that it is possible to expand

\theta(B)^{-1}=\pi(B)=1-\pi_{1}B-\pi_{2}B^{2}-\cdots

with $\pi_{j}\rightarrow 0$ in order to derive 3.1.6 from 3.1.5.

One way to derive the coefficient $\{\pi_{j}\}$ of $1/\theta(B)$ is to use partial fractions which is standard in power-series expansion.

3.2 Selection of moving average models.

We now consider when it is appropriate to use a moving average model to represent a given series. This is based on the autocorrelation properties of the model:

3.2.1 Recall that $\psi_{j}=-\theta_{j}$ so that $\psi_{0}=1$ , $\psi_{1}=-\theta_{1}$ , $\psi_{q}=-\theta_{q}$ . The autocovariances of a moving average model are given in terms of the coefficients $\psi_{j}$ as

\gamma_{x,k}=\sigma_{e}^{2}\left(\psi_{0}\psi_{k}+\cdots+\psi_{q-k}\psi_{q}% \right)=\sigma_{e}^{2}\left(\sum_{i=0}^{q-k}\psi_{i}\psi_{k+i}\right)\mbox{\ % for\ }k=0\ldots q.

The Characteristic property of the model is that
3.2.2

\rho_{k}=0\mbox{\ for\ }k>q.

Continuing the previous example, taking $\sigma_{e}^{2}=10$ , we get

\begin{array}[]{rl}\sigma_{x}^{2}=\gamma_{x,0}=10(1^{2}+(-.7)^{2}+(-.6)^{2})&=% 18.5\\ \gamma_{x,1}=10(1\times(-.7)+(-.7)\times(-.6))&=-2.8\\ \gamma_{x,2}=10(1\times(-.6))&=-6.0\end{array}

so that $\rho_{1}=-2.8/18.5=-0.15$ and $\rho_{2}=-6.0/18.5=-0.32$ .

Proof.

of 3.2.1. Start from

\mbox{Cov}(x_{t},x_{t+k})=\mbox{Cov}\left(\sum_{i=0}^{q}\psi_{i}e_{t-i},\sum_{% j=0}^{q}\psi_{j}e_{t+k-j}\right).

Now each term $e_{t-i}$ in the first sum is correlated with at most one term in the second sum, i.e. the same term which is $e_{t+k-j}$ with $j=k+i$ . So the covariance reduces to

\sum_{i=0}^{q-k}\mbox{Cov}(\psi_{i}e_{t-i},\psi_{k+i}e_{t-i})=\sigma_{e}^{2}% \left(\sum_{i=0}^{q-k}\psi_{i}\psi_{k+i}\right).

∎

Notice that the above formula holds even if $q=\infty$ . In other words, if

x_{t}=\sum_{j=0}^{\infty}\psi_{j}e_{t-j},

where $\{e_{t}\}$ is white noise, then

\gamma_{x,k}=\sigma^{2}_{e}\left(\sum_{j=0}^{\infty}\psi_{j}\psi_{j+k}\right),% \quad\rho_{x,k}=\frac{\sum_{j=0}^{\infty}\psi_{j}\psi_{j+k}}{\sum_{j=0}^{% \infty}\psi_{j}^{2}}.

There is a converse result, that any stationary time series $x_{t}$ having the property 3.2.2 may be represented by a moving average model 3.1.2. satisfying the invertibility condition.

The characteristic property is recognised by inspection of the sample acf of the data series. The sampling properties of this acf are therefore required, and these will be given shortly. But first have a look at two time series for which Moving Average models appear to be appropriate. In fact it is the successive changes, or first differences, of these series which appear to have the characteristics of Moving Average models.

Consider the ‘CABLE’ data, available on Moodle, which is clearly non-stationary. A good model for this data might be of the form $y_{t}=x_{t}e^{(a+bt)}$ where $\{x_{t}\}$ is stationary. We therefore take the logarithm and then the first differences of the series of weekly business transactions as shown in Figure 6 to make it stationary.

Figure 6: First Link, Second Link, Caption: The number of weekly business transactions, and their first difference.

Figure 7 shows the sample time series properties of the differenced series, which are typical of an MA(1) model:

Figure 7: Link, Caption: Sample properties of the differences of the weekly transactions.

The model $x_{t}=\mu+e_{t}-0.713e_{t-1}$ is estimated for this series, and it has the model properties shown in the following figures.

Unnumbered Figure: Link

The second plot shows the implied $\psi_{k}$ weights of the MA representation of 3.1.3, and the third plot shows the implied $\pi_{k}$ weights of the MA representation given in 3.1.7. The fourth plot shows the implied model spectrum which we shall define in Section 3.12.

The second example is of the annual changes in daylength, measured in milliseconds, from the dataset ‘DAYLENGT’ available in Moodle. Part of what we see is almost certainly due to the fact that the original data has been smoothed to give the series shown in the left panel of Figure 9. Each point has been replaced by its average with the previous and next point. And this procedure has then been repeated. This procedure has the property of completely annihilating any cycle of period 3.

Figure 9: First Link, Second Link, Caption: Series of annual changes in daylength and their first differences.

Figure 10 shows the sample time series properties of the differenced series. Among these, the sample spectrum is very low at frequency 1/3 as a consequence of the smoothing. The sample autocorrelations are characteristic of an MA(3) model:

Figure 10: Link, Caption: Sample statistics of the differences in the annual daylength series.

The MA(3) model $x_{t}=\mu+e_{t}+0.89e_{t-1}+0.87e_{t-2}+0.23e_{t-3}$ is estimated for this series, and it has the model properties shown in the following figures, which match the sample properties very well:

Unnumbered Figure: Link

In general the sample value $r_{x,k}$ is an approximately unbiased and consistent estimate of $\rho_{x,k}$ for any fixed lag $k$ as the sample size $n$ tends to infinity. In the particular case of the MA( $q$ ) model there is a useful result for the standard error of the sample acf for lags greater than the order. This helps us to weigh the statistical evidence for a hypothesised value of q.

3.2.3 For a Gaussian time series of length $n$ with acf satisfying $\rho_{k}=0$ for $k>q$

SE\left(r_{k}\right)\approx\left\{\frac{1}{n}\left[1+2\sum_{j=1}^{q}\rho_{j}^{% 2}\right]\right\}^{{\scriptscriptstyle\frac{1}{2}}}\mbox{\ for\ }k>q.

The simplest case, $q=0$ , gives the property under the hypothesis of white noise, that

SE\left(r_{k}\right)\approx\frac{1}{\sqrt{n}}

so it is usual to plot the sample correlations on a graph with approximate $95\%$ error limits drawn at $\pm 2/\sqrt{n}$ about the horizontal axis. If the sample acf values generally lie within these limits with the only values outside them being at unremarkable lags then white noise is a sensible conclusion. But these only apply for assessing the hypothesis of white noise. Under any other hypothesis, the limits are wider.

The sample acf may in fact lie outside these limits for many lags. We are here interested in the case when a small number of values lie clearly outside the limits at low lags. We then have good reason to suppose that a MA( $q$ ) model is appropriate with the value of $q$ being tentatively chosen as the ‘cut-off’ lag - the highest lag with a ‘significant’ acf value.

We need then to be aware of the fact that the sampling variability of the remaining acf values, at lags greater than the assumed MA order $q$ will be increased according to 3.2.3, so that rather more values will lie outside the white noise limits. The increased $S E$ may be estimated from 3.2.3 using the sample acf values $r_{j}$ up to lag $q$ in place of the unknown $\rho_{j}$ .

Another visual characteristic of the sample acf at these higher lags is that their statistical variablity may follow a smoother pattern than for white noise. Even when the true model is MA( $q$ ) these patterns may be deceptively suggestive of structure in the acf, i.e. non zero values of $\rho_{k}$ , which is not actually present at these higher lags.

Estimation of the parameters in a MA( $q$ ) model may sometimes be achieved by solving the equations 3.2.1 for the unknown parameters $\sigma_{e}^{2}$ and $\psi_{j}=-\theta_{j},\ j=1...q$ using sample autocovariance values $C_{x,k}$ in place of $\gamma_{k}$ . Unfortunately these equations may not have a real solution, though when they do it is easily found. For example in the MA( $1$ ) case the solution for $\theta$ is found by solving

r_{x,1}=-\frac{\theta}{1+\theta^{2}}

which has no real solution if $r_{x,1}$ is outside the range $[-{\scriptscriptstyle\frac{1}{2}},{\scriptscriptstyle\frac{1}{2}}]$ , and this may well happen through statistical variability in $r_{x,1}$ .

3.3 Autoregressive models.

Consider the model
3.3.1

x_{t}=\mu_{x}+\phi(x_{t-1}-\mu_{x})+e_{t}

which defines successive values of the series $x_{t}$ in terms of previous values and a white noise series $e_{t}$ . This is the first order autoregessive model, commonly referred to as the AR(1) model. The essential distinction with prediction models is that for the prediction equation $e_{t}$ is only required to be uncorrelated with $x_{t-1}$ - in general it will not be white noise. For the AR(1) model however, $e_{t}$ is assumed to be white noise and will, as a result of that, be independent of $x_{t-1}$ .

The properties of this model may be found by successively substituting

x_{t}-\mu_{x}=e_{t}+\phi(x_{t-1}-\mu_{x})=e_{t}+\phi(e_{t-1}+\phi(x_{t-2}-\mu_% {x}))=\ldots

=e_{t}+\phi e_{t-1}+\phi^{2}e_{t-2}+\cdots+\phi^{L}e_{t-L}+\phi^{L+1}(x_{t-L-1% }-\mu_{x})

Now assuming that both $|\phi|<1$ and $x_{t}$ is bounded ‘in the infinite past’ we can let $L\rightarrow\infty$ and obtain the convergent sum:
3.3.2

x_{t}=\mu_{x}+e_{t}+\phi e_{t-1}+\phi^{2}e_{t-2}+\cdots=\mu_{x}+\sum_{i=0}^{% \infty}\phi^{i}e_{t-i}.

We can derive from this, for example, E( $x_{t}$ )= $\mu_{x}$ and

\sigma_{x}^{2}=\mbox{Var}(x_{t})=\sigma_{e}^{2}+\phi^{2}\sigma_{e}^{2}+\phi^{4% }\sigma_{e}^{2}+\cdots=\frac{\sigma_{e}^{2}}{1-\phi^{2}}.

The representation 3.3.2 could in fact be obtained using the backward shift operator notation by writing 3.3.1 as $(1-\phi B)(x_{t}-\mu_{x})=e_{t}$ so that

x_{t}-\mu_{x}=(1-\phi B)^{-1}e_{t}=(1+\phi B+\phi^{2}B^{2}+\cdots)e_{t}.

The variance and correlation properties can be derived directly from the model 3.3.1 without using this expansion. Because $x_{t-1}$ and $e_{t}$ are uncorrelated we get

\sigma_{x}^{2}=\mbox{Var}(x_{t})=\mbox{Var}(\phi x_{t-1}+e_{t})=\phi^{2}\sigma% _{x}^{2}+\sigma_{e}^{2}\ \Rightarrow\ (1-\phi^{2})\sigma_{x}^{2}=\sigma_{e}^{2}.

Similarly $e_{t}$ is uncorrelated with $x_{t-k}$ for any $k>0$ so

$\displaystyle\gamma_{x,k}$	$\displaystyle=$	$\displaystyle\mbox{Cov}(x_{t},x_{t-k})$
	$\displaystyle=$	$\displaystyle\mbox{Cov}(\phi x_{t-1}+e_{t},x_{t-k})$
	$\displaystyle=$	$\displaystyle\mbox{Cov}(\phi x_{t-1},x_{t-k})$
	$\displaystyle=$	$\displaystyle\phi\mbox{Cov}(x_{t-1},x_{t-k})$
	$\displaystyle=$	$\displaystyle\phi\gamma_{x,k-1}.$

It follows that

\gamma_{x,1}=\phi\sigma_{x}^{2},\ \gamma_{x,2}=\phi\gamma_{x,1}=\phi^{2}\sigma% _{x}^{2},\ \ldots\ ,\gamma_{x,k}=\phi^{k}\sigma_{x}^{2}.

Note therefore that $\rho_{x,1}=\phi$ and $\rho_{x,k}=\phi^{k}$ for $k>0$ .

3.3.3 Definition. A time series follows an autoregressive model of order $p$ - AR(p), if it is generated by the process:

x_{t}=\mu_{x}+\phi_{1}(x_{t-1}-\mu_{x})+\phi_{2}(x_{t-2}-\mu_{x})+\cdots+\phi_% {p}(x_{t-p}-\mu_{x})+e_{t}

where $e_{t}$ is a white noise series. Using operator notation the model may be written:
3.3.4

(1-\phi_{1}B-\phi_{2}B^{2}-\cdots-\phi_{p}B^{p})(x_{t}-\mu_{x})=\phi(B)(x_{t}-% \mu_{x})=e_{t}.

We also require:
3.3.5 Definition. The stationarity condition for model 3.3.3 is that the operator $\phi(B)$ satisfies

\phi(B)\neq 0\mbox{\ for\ }|B|\leq 1.

Remark. The condition is the same as that required of $\theta(B)$ for a moving average model to be invertible (see 3.1.8). In this case it means that it is possible to carry out the convergent expansion following from 3.3.4:

$\displaystyle\phi(B)(x_{t}-\mu_{x})$	$\displaystyle=$	$\displaystyle e_{t}$
$\displaystyle\Rightarrow\ x_{t}-\mu_{x}$	$\displaystyle=$	$\displaystyle\phi(B)^{-1}e_{t}$
	$\displaystyle=$	$\displaystyle(1+\psi_{1}B+\psi_{2}B^{2}+\cdots)e_{t}$
$\displaystyle\Rightarrow\ x_{t}$	$\displaystyle=$	$\displaystyle\mu_{x}+\sum_{i=0}^{\infty}\psi_{i}e_{t-i}$

where $\psi_{j}\rightarrow 0$ . This is an infinite moving average representation of the model (compare with the finite represenation of 3.1.3 for MA processes). We note one consequence which may be obvious from the way the series is generated, but which we can formally verify with this expansion; the fact that $e_{t}$ is uncorrelated with all past values of the series:
3.3.6

\mbox{Cov}(e_{t},x_{t-i})=0\mbox{\ for\ }i\geq 1.

This is because $x_{t-i}=\mu_{x}+e_{t-i}+\psi_{1}e_{t-i-1}+\cdots$ in which all terms are uncorrelated with $e_{t}$ .

3.4 Properties of autoregressive models.

The main contrast with moving average models is that the acf does not have a cut off, but generally decays towards higher lags. The pattern of decay is obviously reflected in the behaviour of the time series generated from this model, and includes cyclical patterns. Historically, this was a reason for the introduction of these models.

3.4.1 The Yule - Walker equations relating the coefficients and acf of the AR( $p$ ) model.

The coefficients $\phi_{1}\ldots\phi_{p}$ determine, and are determined by, the acf values $\rho_{x,1}\ldots\rho_{x,p}$ , by the equations

\rho_{x,i}=\phi_{1}\rho_{x,i-1}+\cdots+\phi_{p}\rho_{x,i-p}\mbox{\ for\ }i=1% \ldots p.

Displayed as a set of linear equations for $\phi_{1}\ldots\phi_{p}$ these are

\left(\begin{array}[]{ccccc}1&\rho_{1}&\rho_{2}&\cdots&\rho_{p-1}\\ \rho_{1}&1&\rho_{1}&\cdots&\rho_{p-2}\\ \rho_{2}&\rho_{1}&1&\cdots&\vdots\\ \vdots&\vdots&\vdots&\ddots&\rho_{1}\\ \rho_{p-1}&\rho_{p-2}&\cdots&\rho_{1}&1\end{array}\right)\left(\begin{array}[]% {c}\phi_{1}\\ \phi_{2}\\ \vdots\\ \vdots\\ \phi_{p}\end{array}\right)=\left(\begin{array}[]{c}\rho_{1}\\ \rho_{2}\\ \vdots\\ \vdots\\ \rho_{p}\end{array}\right)

where for simplicity the subscript $x$ of $\rho_{x,i}$ has been omitted. We estimate the parameters $\phi_{1},\ldots\phi_{p}$ by replacing the acf $\{\rho_{k}\}$ by sample acf $\{r_{k}\}$ .

Proof.

We demonstrate the more general result:
3.4.2

\rho_{x,i}=\phi_{1}\rho_{x,i-1}+\cdots+\phi_{p}\rho_{x,i-p}\mbox{\ for all\ }i% \geq 1.

The Yule-Walker equations are just the subset of these for $i=1...p$ .

From 3.3.6 and for $i\geq 1$

$\displaystyle 0$	$\displaystyle=$	$\displaystyle\mbox{Cov}(e_{t},x_{t-i})$
	$\displaystyle=$	$\displaystyle\mbox{Cov}(x_{t}-\phi_{1}x_{t-1}-\cdots\phi_{p}x_{t-p},x_{t-i})$
	$\displaystyle=$	$\displaystyle\gamma_{x,i}-\phi_{1}\gamma_{x,i-1}-\cdots-\phi_{p}\gamma_{x,i-p}$
$\displaystyle\ \Rightarrow\ \gamma_{x,i}$	$\displaystyle=$	$\displaystyle\phi_{1}\gamma_{x,i-1}+\cdots+\phi_{p}\gamma_{x,i-p}.$

Divide through by $\sigma_{x}^{2}=\gamma_{x,0}$ for the required result. ∎

These equations are completed by one further useful relationship:
3.4.3

\frac{\sigma_{e}^{2}}{\sigma_{x}^{2}}=(1-\phi_{1}\rho_{x,1}-\phi_{2}\rho_{x,2}% -\cdots-\phi_{p}\rho_{x,p})

Proof.

In two steps. The first is that

\sigma_{e}^{2}=\mbox{Cov}(x_{t}-\phi_{1}x_{t-1}-\cdots-\phi_{p}x_{t-p},e_{t})=% \mbox{Cov}(x_{t},e_{t})

because $e_{t}$ is uncorrelated with all past $x_{t-i}$ . Extending this;

$\displaystyle\sigma_{e}^{2}$	$\displaystyle=$	$\displaystyle\mbox{Cov}(x_{t},e_{t})$
	$\displaystyle=$	$\displaystyle\mbox{Cov}(x_{t},x_{t}-\phi_{1}x_{t-1}-\cdots-\phi_{p}x_{t-p})$
	$\displaystyle=$	$\displaystyle\gamma_{x,0}-\phi_{1}\gamma_{x,1}-\cdots-\phi_{p}\gamma_{x,p}.$

Divide by $\sigma_{x}^{2}$ for the result. ∎

Figure 12 shows a good example of a time series that resembles a first order autoregressive process. It is a series of temperature measurements modified by wind speed and other variables for use in predicting energy demands. The sample properties, also shown in Figure 12, are typical of this process.

Figure 12: Link, Caption: Series of daily temperature indicator.

The model fitted to this series was $x_{t}=\mu_{x}+0.77\,(x_{t-1}-\mu_{x})+e_{t}$ . The properties of the model, presented in the next figures, correspond very well to the sample properties of the series.

Unnumbered Figure: Link

3.4.4 Example. The AR( $2$ ) model:

x_{t}=\mu_{x}+\phi_{1}(x_{t-1}-\mu_{x})+\phi_{2}(x_{t-2}-\mu_{x})+e_{t}.

This process has the capacity to represent a series with an irregular cycle and was put forward by Yule to model the sunspot cycle. The motivation was a discrete time version of the second order differential equation for a damped pendulum, including random fluctuations. For $p=2$ the Yule-Walker equations give:
3.4.5

\left.\begin{array}[]{ccccc}\phi_{1}&+&\phi_{2}\rho_{x,1}&=&\rho_{x,1}\\ \rho_{x,1}\phi_{1}&+&\phi_{2}&=&\rho_{x,2}\end{array}\right\}\Rightarrow\left.% \begin{array}[]{ccc}\rho_{x,1}&=&\phi_{1}/(1-\phi_{2})\\ \rho_{x,2}&=&\phi_{1}^{2}/(1-\phi_{2})+\phi_{2}\end{array}\right\}.

If the model parameters $\phi_{1},\phi_{2}$ are given, these equations supply $\rho_{x,1},\rho_{x,2}$ . Then 3.4.2 generates successively $\rho_{x,3},\rho_{x,4}\ldots$ using $\rho_{x,i}=\phi_{1}\rho_{x,i-1}+\phi_{2}\rho_{x,i-2}$ .

Figure 14 shows a series which appears to follow an AR(2) model. It is an indicator of past ocean surface temperatures, obtained by measuring oxygen isotope ratios in a core from the ocean sediment. The time scale is in thousands of years. The series has an irregular cyclical appearance, reflected in its sample autocorrelations.

Figure 14: Link, Caption: Indicator of past ocean surface temperatures.

The coefficients of the fitted AR(2) model were found to be $\phi_{1}=1.09$ and $\phi_{2}=-0.42$ . The next five figures show the properties of the fitted model, which are similar to those of the sample properties.

Unnumbered Figure: Link

3.5 Selection of autoregressive models.

We began the previous section by saying that the autocorrelation properties of autoregressive models are generally quite distinct from those of moving average models. However we also have

3.5.1 The characteristic property of the AR( $p$ ) model is that its partial autocorrelation function has a ‘cut-off’ at lag $p$ , i.e.

\rho|_{x,k}=0\mbox{\ for\ }k>p.

This is because conditioning $x_{t}$ upon $x_{t-1},x_{t-2},\ldots,x_{t-k+1}$ for any $k>p$ includes $x_{t-1},x_{t-2},\ldots,x_{t-p}$ . If $p$ lagged terms have been included in the predictor, the error $e_{t}$ is white noise and uncorrelated with all terms $x_{t-k}$ at higher lags than $p$ . Including these further terms cannot improve the predictor; this implies that the partial correlation at these lags is zero.

Again there is a converse result, that any stationary time series satisfying the property 3.5.1 may be represented by the AR( $p$ ) model 3.3.4 with $e_{t}$ being a white noise series, and $\phi(B)$ satisfying the stationarity condition.

The characteristic property is recognised by inspection of the sample pacf of the data series. The sampling properties of $r|_{x,k}$ are that they are approximately unbiased and consistent estimates of $\rho|_{x,k}$ for any fixed lag as the sample size $n$ tends to infinity. When the series $x_{t}$ follows the AR( $p$ ) model the following sampling property holds:

3.5.2

SE\left(r|_{x,k}\right)\approx\frac{1}{\sqrt{n}}\mbox{\ for\ }k>p.

It is therefore usual to plot the sample pacf on a graph with approximate $95\%$ error limits for zero pacf values of $\pm 2/\sqrt{n}$ about the horizontal axis. We are interested in the case when a small number of values lie clearly outside the limits at low lags. The highest lag with a significant pacf value then indicates the order $p$ .

Remark. Unlike the acf which may extend to high lags before decaying effectively to zero, the pacf for most practically occurring series does not extend to very high lags. This is because

\mbox{Var}(x_{t}|x_{t-1},x_{t-2},\ldots x_{t-k})=\mbox{Var}(x_{t})(1-\rho|_{x,% 1}^{2})(1-\rho|_{x,2}^{2})\ldots(1-\rho|_{x,k}^{2})

measures the ability to predict the series from past values, which is usually rather limited. The first few pacf values usually ‘take up’ most of this. So in practice series which do not in actual fact follow an AR model may be well approximated by one of modest order. The true pacf may not actually ‘cut-off’ at some lag, but may just become small compared with the SE of the sample value.

3.6 Autoregressive Moving average models.

3.6.1 Definition. A time series follows an autoregressive moving average model of orders p and q, an ARMA( $p, q$ ) model, if it is generated by the process

x_{t}=\mu_{x}+\phi_{1}(x_{t-1}-\mu_{x})+\phi_{2}(x_{t-2}-\mu_{x})+\cdots+\phi_% {p}(x_{t-p}-\mu_{x})+e_{t}-\theta_{1}e_{t-1}-\cdots-\theta_{q}e_{t-q}

where $e_{t}$ is a white noise series. Using operator notation the model may be written

\phi(B)(x_{t}-\mu_{x})=\theta(B)e_{t}

where the operators $\phi(B)$ and $\theta(B)$ are as defined for the pure AR and MA models.

We again require conditions

•

for stationarity $\phi(B)\neq 0$ for $|B|\leq 1$
•

for invertibility $\theta(B)\neq 0$ for $|B|\leq 1$ .

The first ensures that we can express $x_{t}$ as an infinite moving average form:
3.6.2

x_{t}=\mu_{x}+\phi(B)^{-1}\theta(B)e_{t}=\mu_{x}+\psi(B)e_{t}=e_{t}+\psi_{1}e_% {t-1}+\psi_{2}e_{t-2}+\cdots

and the second that we can recover $e_{t}$ from $x_{t}$ by

e_{t}=\theta(B)^{-1}\phi(B)(x_{t}-\mu_{x})=\pi(B)(x_{t}-\mu_{x})=(x_{t}-\mu_{x% })-\pi_{1}(x_{t-1}-\mu_{x})-\pi_{2}(x_{t-2}-\mu_{x})-\cdots

which is simply rearranged to give the infinite autoregressive form
3.6.3

x_{t}=\mu_{x}+\pi_{1}(x_{t-1}-\mu_{x})+\pi_{2}(x_{t-2}-\mu_{x})+\cdots+e_{t}.

Remark. The existence of these infinite forms means that in theory an ARMA process can be approximated with arbitrary accuracy by either a pure MA or AR model of finite order. Given a finite data series this can lead to some uncertainty as to which model to use.

3.6.4 Example. The ARMA(1,1) model as defined in 3.6.3 has

\psi(B)=\frac{1-\theta B}{1-\phi B}=(1-\theta B)(1+\phi B+\phi^{2}B^{2}+\cdots% )=1+(\phi-\theta)B+(\phi-\theta)\phi B^{2}+\cdots

so that $\psi_{i}=(\phi-\theta)\phi^{i-1}$ . Similarly

\pi(B)=\frac{1-\phi B}{1-\theta B}=(1-\phi B)(1+\theta B+\theta^{2}B^{2}+% \cdots)=1-(\phi-\theta)B-(\phi-\theta)\theta B^{2}+\cdots

so that $\pi_{i}=(\phi-\theta)\theta^{i-1}$ .

Remark. Note that in this example if $\phi=\theta$ then simply $x_{t}=\mu_{x}+e_{t}$ , because $(1-\phi B)$ and $(1-\theta B)$ cancel. In general we require that $\phi(B)$ and $\theta(B)$ have no cancelling factors, else the model would not be distinguishable from a simpler one. The problem can come with model estimation from finite data series when near-cancelling factors can occur. The signal plus noise model can give this situation if the noise almost swamps the signal.

Figure 16 in the next page shows a series of central England average temperatures, that plausibly follows such a model. The coefficients of the fitted ARMA(1,1) model were found to be $\phi=0.849$ and $\theta=0.655$ .

Figure 16: Link, Caption: Series of annual central England average temperatures and its sample statistics.

The properties of the fitted model, shown in the next six figures, are similar to those of the sample properties. The last plot shows a smoothed pattern of the series, estimating the underlying level which might be represented by an AR(1) process.

Unnumbered Figure: Link

3.7 Selection of ARMA models.

The first point to make is that in general neither of the characteristic properties of the MA or AR model holds for ARMA models. This is a negative statement but nevertheless of value in suggesting that a mixed ARMA model is needed.

A second point is that even if it appears that a pure AR or MA model is appropriate from recognition of one characteristic property or the other, a model can sometimes be improved by adding a MA term to a pure AR model or vice versa. This is because inspection of the sample acf and pacf, though providing an indication of the basic model, may ‘lose’ other indications in the sampling fluctuations.

The main concern though is that there may be no simple guide as to the choice of the model orders which are needed in order to estimate the model parameters. The acf properties of the model are considered first. These may be derived from the infinite moving average form 3.6.5 by letting $q=\infty$ in 3.2.1:
3.7.1

\gamma_{x,k}=\sigma_{e}^{2}\left(\psi_{0}\psi_{k}+\psi_{1}\psi_{k+1}+\cdots% \right)=\sigma_{e}^{2}\left(\sum_{i=0}^{\infty}\psi_{i}\psi_{k+i}\right)\mbox{% \ for\ }k>0.

These infinite sums can be avoided by solving appropriate equations similar to the Yule-Walker equations, although the infinite moving average is initially useful for deriving:
3.7.2

\mbox{Cov}(e_{t},x_{t-i})=0\mbox{\ for\ }i>0,\hskip 21.681pt\mbox{Cov}(x_{t},e% _{t})=\sigma_{e}^{2},\hskip 21.681pt\mbox{Cov}(x_{t},e_{t-i})=\psi_{i}\sigma_{% e}^{2}\mbox{\ for\ }i>0.

3.7.3 Example. The ARMA( $1,1$ ) model has variance and autocorrelations given by

\sigma_{x}^{2}=\sigma_{e}^{2}\frac{(1-2\theta\phi+\theta^{2})}{(1-\phi^{2})},% \hskip 72.27pt\rho_{x,k}=\frac{(\phi-\theta)(1-\theta\phi)}{(1-2\theta\phi+% \theta^{2})}\phi^{k-1}\mbox{\ for\ }k>0.

Note that the pattern of autocorrelations is similar to that for the AR( $1$ ) in the sense that both exhibit geometric decay of the autocorrelation as a function of the lag $k$ .

Proof. We use $\mbox{Cov}(x_{t},e_{t-1})=\psi_{1}\sigma_{e}^{2}=(\phi-\theta)\sigma_{e}^{2}$ to derive the first of the equations:

\begin{array}[]{rlll}\gamma_{x,0}=&\mbox{Cov}(x_{t},x_{t})=&\mbox{Cov}(\phi x_% {t-1}+e_{t}-\theta e_{t-1},x_{t})=&\phi\gamma_{x,1}+\sigma_{e}^{2}-\theta(\phi% -\theta)\sigma_{e}^{2}\\ \gamma_{x,1}=&\mbox{Cov}(x_{t},x_{t-1})=&\mbox{Cov}(\phi x_{t-1}+e_{t}-\theta e% _{t-1},x_{t-1})=&\phi\gamma_{x,0}+0-\theta\sigma_{e}^{2}\\ \gamma_{x,k}=&\mbox{Cov}(x_{t},x_{t-k})=&\mbox{Cov}(\phi x_{t-1}+e_{t}-\theta e% _{t-1},x_{t-k})=&\phi\gamma_{x,k-1}+0-0\mbox{\ for\ }k>1.\end{array}

The first two can be solved for $\sigma_{x}^{2}$ and $\gamma_{x,1}$ as stated, and the third shows that $\gamma_{x,k}=\phi^{k-1}\gamma_{x,1}$ .

For the general ARMA model the decay pattern of the acf is similar to that for the AR model with the same autoregressive part, but it is modified by the presence of the moving average part to allow greater flexibility of form. The pattern is determined by the equation relating successive acf values for the general ARMA model:
3.7.4

\rho_{x,k}=\phi_{1}\rho_{x,k-1}+\cdots+\phi_{p}\rho_{k-p}\quad\mbox{\ for\ }k>q

which derives directly from the model:

\begin{array}[]{rl}\gamma_{x,k}=\mbox{Cov}(x_{t},x_{t-k})=&\mbox{Cov}(\phi_{1}% x_{t-1}+\cdots\phi_{p}x_{t-p}+e_{t}-\theta_{1}e_{t-1}-\cdots-\theta_{q}e_{t-q}% ,x_{t-k})\\ =&\phi_{1}\gamma_{x,k-1}+\cdots+\phi_{p}\gamma_{x,k-p}+0-0-\cdots-0\quad\mbox{% \ for\ }k>q.\end{array}

This pattern of decay can help to indicate the appropriate AR order $p$ . For example cyclical delay indicates that $p$ should be at least $2$ . The modification at low lags to the smooth pattern of decay associated with the pure AR model can also help to indicate the MA order $q$ , as the ARMA( $1,1$ ) example illustrates. For the ARMA( $1,2$ ) model the geometric decay of the acf starts from lag $2$ .

The other statistical tools available for selecting the orders of an ARMA model are the sample pacf and sample spectrum. The sample pacf is not of great value for either the MA or ARMA model, but its pattern is generally similar to that of the $\pi_{i}$ coefficients which appear in the infinite AR representation of these models. For the ARMA( $1,1$ ) example this pattern is approximately a geometric decay by the factor $\theta$ from the lag $1$ value of the pacf.

The remaining sections in this Chapter (3.8–3.12) are not examinable but will be needed for your project. We will discuss these sections after covering Chapters 4–5.

3.8 The sinusoidal or harmonic regression model.

A suitable model for a smooth cycle is
3.8.1

y_{t}=c+R\cos 2\pi(f\,t\,+\,g)+e_{t}

where

•

The Amplitude of the cycle is $R>0$
•

The Frequency in cycles per unit time (that is over $t\in[0,1]$ ) is $f$ where $0<f<0.5$ .
•

The Phase, that is the fraction of a cycle by which it lags, is $g$ , $0\leq g<1$ .

We restrict $R\geq 0$ , $f\geq 0$ and $0\leq g<1$ .

We shall sometimes write $R\cos 2\pi(f\,t\,+\,g)=R\cos(\omega t+\phi)$ using the angular frequency $\omega=2\pi f$ of radians per unit time and angular phase $\phi=2\pi g$ - the notation is more convenient at times.

Rather than estimate $R$ and $\phi$ directly we use

\cos(\omega t+\phi)=\cos\omega t\cos\phi-\sin\omega t\sin\phi

to rewrite the model as
3.8.2

y_{t}=c+A\cos\omega t+B\sin\omega t+e_{t}

where $A=R\cos\phi$ and $B=-R\sin\phi.$ So if $\omega$ is known this is a linear regression model with coefficients $A$ and $B$ .

Sometimes the fit to a smooth cycle is improved by adding further cycles known as harmonics because their frequencies are multiples of the fundamental frequency $\omega$ , e.g.

y_{t}=c+A_{1}\cos\omega t+B_{1}\sin\omega t+A_{2}\cos 2\omega t+B_{2}\sin 2% \omega t+A_{3}\cos 3\omega t+B_{3}\sin 3\omega t+e_{t}

Thus cycles which are asymmetric, with a slow rise and rapid fall, or sharp peaks with flat troughs, may be well fitted by including sufficient harmonics provided the pattern repeats itself periodically.

3.9 Harmonic regression for seasonal models.

We have previously used the seasonal factor model $\alpha_{1}M_{1,t}+\cdots+\alpha_{12}M_{12,t}$ to represent a periodic component of a monthly series based on a nonsmooth model. An equivalent smooth model is one which treats this component in terms of cycles of period $12$ and uses instead as regression variables the set of $12$ sinusoids defined for $t=1\ldots n$ by
3.9.1

c_{0,t}=1

\left.\begin{array}[]{l}c_{j,t}=\cos 2\pi f_{j}t\\ s_{j,t}=\sin 2\pi f_{j}t\end{array}\right\}\mbox{\ for\ }j=1...5

c_{6,t}=(-1)^{t}

where $f_{j}=j/12$ is the frequency of the $j$ th harmonic of the fundamental frequency $1/12$ . For a more general period $s$ the pairs of cosine and sine regressors are present for $j=1...({\scriptstyle\frac{1}{2}}s-1)$ when $s$ is even. When $s$ is odd there are only $j=1...{\scriptstyle\frac{1}{2}}(s-1)$ such terms. The final, alternating, term $c_{{\scriptscriptstyle\frac{1}{2}}s,t}$ is only present when $s$ is even. Note that $0\leq f_{j}\leq 1/2$ .

The fit to the data using these regressors will be exactly the same as that given by the seasonal factor. The relative advantages arise when it may be possible to obtain a good fit using fewer than the full set of regressors. This can happen if the seasonal effect is confined to just a few months in the year for the factor model, and if the seasonal cycle is very smooth in the case of the harmonic regression. Figure 18 shows the results of fitting the model to the CO2 series, but using just 2 pairs of harmonics. The residual mean square of 0.1496 was slightly lower than the value of 0.1541 obtained using the monthly factor model.

Figure 18: First Link, Second Link, Caption: Fit to Monthly CO2 of a model with trend and two pairs of harmonic terms.

3.10 The Nyquist frequency.

3.10.1 Lemma

Any cycle $R\cos 2\pi(f^{\prime}t+g)$ of frequency $f^{\prime}>{\scriptscriptstyle\frac{1}{2}}$ when sampled at integer time points $t$ is indistinguishable from a cycle with frequency lying in the range $0\leq f\leq{\scriptscriptstyle\frac{1}{2}}$ .

Proof. Any frequency $f^{\prime}$ can be expressed as $f^{\prime}=K\pm f$ where $K$ is an integer and $f$ is unique in the interval $0\leq f\leq{\scriptscriptstyle\frac{1}{2}}$ . Then for integer $t$

R\cos 2\pi([K\pm f]t+g)=R\cos 2\pi(\pm ft+g)=R\cos 2\pi(ft\mp g)

The frequency $f$ is called an alias of $f^{\prime}$ .

This Lemma leads to the
3.10.2 Convention. Assume that for a discrete regularly sampled time series the frequency of any cyclical component lies in the range $0\leq f\leq{\scriptscriptstyle\frac{1}{2}}$ cycles per sampling interval.

We usually abbreviate this just to $0\leq f\leq{\scriptscriptstyle\frac{1}{2}}$ . The frequency of ${\scriptscriptstyle\frac{1}{2}}\mbox{cycle per sampling interval}$ is called the Nyquist frequency. Of course the assumption may be wrong - see Michael Faraday’s Royal Society article ‘On a curious class of optical deceptions’. But the convention does ensure that if the sampling interval is sufficiently short the correct interpretation will be made. The possibility that, under this convention, the assumed frequency is an alias of a higher frequency, is a consideration in the choice of sampling interval.

This Lemma explains why no more than six harmonics are used to describe the seasonal pattern, with the highest frequency being $f_{6}={\scriptscriptstyle\frac{1}{2}}$ .

3.11 The periodogram

This is a tool for investigating whether a time series contains cycles, and for finding their amplitude and frequency.

3.11.1Definition. The periodogram of a time series $y_{1},\ldots,y_{n}$ is defined for $0\leq f\leq{\scriptscriptstyle\frac{1}{2}}$ by

I(f)=\frac{2}{n}\left\{\left(\sum_{t=1}^{n}y_{t}\cos 2\pi ft\right)^{2}+\left(% \sum_{t=1}^{n}y_{t}\sin 2\pi ft\right)^{2}\right\}

Remark. This is motivated by the problem of estimating $R$ and $f$ in the regression

y_{t}=R\cos 2\pi(f\,t\,+\,g)+e_{t}=A\cos 2\pi ft+B\sin 2\pi ft+e_{t}.

The least squares equations for $\hat{A}$ and $\hat{B}$ may be approximated, provided $f\neq 0$ or ${\scriptscriptstyle\frac{1}{2}}$ , as

\left(\begin{array}[]{cc}\sum_{1}^{n}(\cos 2\pi ft)^{2}&\sum_{1}^{n}\cos 2\pi ft% \sin 2\pi ft\\ \sum_{1}^{n}\sin 2\pi ft\cos 2\pi ft&\sum_{1}^{n}(\sin 2\pi ft)^{2}\end{array}% \right)\left(\begin{array}[]{c}\hat{A}\\ \hat{B}\end{array}\right)\approx\left(\begin{array}[]{cc}{\scriptscriptstyle% \frac{1}{2}}n&0\\ 0&{\scriptscriptstyle\frac{1}{2}}n\end{array}\right)\left(\begin{array}[]{c}% \hat{A}\\ \hat{B}\end{array}\right)=\left(\begin{array}[]{c}\sum_{1}^{n}y_{t}\cos 2\pi ft% \\ \sum_{1}^{n}y_{t}\sin 2\pi ft\end{array}\right)

so that

\hat{A}\approx\frac{2}{n}\sum_{1}^{n}y_{t}\cos 2\pi ft\ ,\ \hat{B}\approx\frac% {2}{n}\sum_{1}^{n}y_{t}\sin 2\pi ft\

and

I(f)\approx{\scriptstyle\frac{1}{2}}n\left(\hat{A}^{2}+\hat{B}^{2}\right)={% \scriptstyle\frac{1}{2}}n\hat{R}^{2}

3.11.2 Interpretation of the periodogram. There are efficient methods for calculating the periodogram over a fine grid of frequencies spanning $0\leq f\leq{\scriptstyle\frac{1}{2}}$ . The scaling factor in the definition is chosen so that the area under the graph of $I(f)$ is the mean square value of the series:

\int_{f=0}^{{\scriptscriptstyle\frac{1}{2}}}I(f)df=\frac{1}{n}\sum_{t=1}^{n}y_% {t}^{2}

Usually the series is mean corrected before calculating the periodogram so this becomes the sample variance.

A single cycle of amplitude $R$ and frequency $f$ will have a periodogram which has a peak at frequency $f$ with height close to ${\scriptstyle\frac{1}{2}}nR^{2}$ and width $\frac{1}{n}$ . The main peak will be neighboured by smaller peaks of diminishing height, called side-lobes, at frequencies of $f\pm\frac{1}{n},\ f\pm\frac{2}{n}\ldots$ .

If the series consists of several such cycles then provided their frequencies are separated by at least $\frac{2}{n}$ , i.e. they differ by at least one cycle in the sample length, then each cycle will usually give rise to a distinct peak at its own frequency. However, if the amplitude of one cycle is much less than another the frequency separation will have to be greater to avoid a sidelobe of the larger peak masking the smaller peak. Figure 19 shows a series consisting of two cycles added together, and their periodogram.

Figure 19: First Link, Second Link, Caption: The sum of two cycles and its periodogram.

The periodogram of an irregular cycle with varying amplitude and frequency will typically appear as a collection of peaks over a broader frequency range.

A trend in a time series will appear as a peak at frequency zero, often much larger than any other peak, so it is wise to remove any strong trend by regression before examining the periodogram. The plots in the next four figures show (A) the periodogram for the raw (mean corrected) CO2 series, (B) the periodogram after trend correction - now the seasonal cycles are more noticeable, (C) the periodogram of the residuals after fitting the trend and the cycle with period 12 - the next harmonic is now clearer. Finally, (D) shows the periodogram of the residuals after fitting the trend and the harmonics at periods 12 and 6. What is seen is the typical periodogram of a correlated series - but we can also notice some evidence for a harmonic at period 4.

Unnumbered Figure: Link

In general the periodogram should be viewed as a transformation of the data. It has particular value for detecting regular cycles, but is also useful for understanding the irregular or random features. An important case is when the series is white noise. Then whatever frequency $0\leq f\leq{\scriptstyle\frac{1}{2}}$ , using properties of sine and cosine functions, the estimates of the above regression will have the property

\hat{A}\mbox{\ and\ }\hat{B}\mbox{\ are independent Normal with means\ }0\mbox% {\ and variance\ }\frac{2}{n}\sigma_{e}^{2}

so that

I(f)=\frac{n}{2}\left(\hat{A}^{2}+\hat{B}^{2}\right)\sim\sigma_{e}^{2}\chi_{2}% ^{2}=\mbox{Exponential}(2\sigma_{e}^{2})

where here the parameter $\theta$ is the mean of the exponential distribution with pdf $f(x)=\theta^{-1}\exp(x/\theta)$ .

Moreover the values at different frequencies are close to being independent random variables provided the frequencies are separated by at least $\frac{1}{n}$ . Because the central $90\%$ of this distribution ranges from about one twentieth to three times its mean, the general picture of the periodogram in this case is very variable with many ‘peaks’. These are just due to natural statistical fluctuation but are easily misinterpreted as revealing regular cycles in the data. In a sufficiently large data set the peak of height ${\scriptstyle\frac{1}{2}}nR^{2}$ of a regular cycle would stand clear of these fluctuations due to noise which have mean $2\sigma_{e}^{2}$ . Figure 21 illustrates these points with an artificial series of random Gaussian noise, and its periodogram.

Figure 21: First Link, Second Link, Caption: A random Gaussian series and its periodogram.

3.12 The spectrum of a stationary time series.

3.12.1 Introduction. The periodogram defined in 3.11.1, for a (mean corrected) time series sample $x_{1},x_{2},\ldots,x_{n}$ (using now $x_{t}$ rather than $y_{t}$ ) may be shown to be related to the sample acf by

I(f)=2s_{x}^{2}\left\{1+2\sum_{k=1}^{n-1}r_{k}\cos 2\pi kf\right\}=

2s_{x}^{2}\left\{1+2\left[r_{1}\cos 2\pi f+r_{2}\cos 2\pi 2f+r_{3}\cos 2\pi 3f% +\ldots+r_{n-1}\cos 2\pi(n-1)f\right]\right\}.

Now assuming that $x_{t}$ is a stationary time series the following definition is motivated by replacing the sample acf $r_{k}$ with $\rho_{k}$ :
3.12.2 The spectrum of $x_{t}$ is

S(f)=2\sigma_{x}^{2}\left\{1+2\sum_{k=1}^{\infty}\rho_{k}\cos 2\pi kf\right\}=

2\sigma_{x}^{2}\left\{1+2\left[\rho_{1}\cos 2\pi f+\rho_{2}\cos 2\pi 2f+\rho_{% 3}\cos 2\pi 3f+\ldots\right]\right\}.

We shall call the periodogram the sample spectrum and also write $I(f)=S^{\star}(f)$ when it is believed that the data arise from a stationary time series. The periodogram by its very definition is non-negative, and the same property may be proved for the spectrum
3.12.3

S(f)\geq 0\ \mbox{\ for\ }0\leq f\leq{\scriptscriptstyle\frac{1}{2}}.

The simplest example is that of white noise for which the spectrum has the constant, or uniform, value $2\sigma_{x}^{2}$ . In general the spectrum may be thought of as describing how the variance of the series is spread over the frequency range, so that corresponding to 3.10.2 we have

\int_{f=0}^{{\scriptscriptstyle\frac{1}{2}}}S(f)df=\sigma_{x}^{2}.

The next simplest example is of a series for which $\rho_{1}=\rho\neq 0$ but $\rho_{k}=0$ for $k>1$ , so that

S(f)=2\sigma^{2}\left\{1+2\rho\cos 2\pi f\right\}.

For this example the extreme values of $S(f)$ , found at the endpoints $f=0$ and $f={\scriptscriptstyle\frac{1}{2}}$ of the range, are $1+2\rho$ and $1-2\rho$ . For the spectrum to be positive both these must be positive so that we obtain the constraint $|\rho|\leq{\scriptscriptstyle\frac{1}{2}}$ . In general the spectrum can have any positive values over its range.

One expression for the spectrum of an MA( $q$ ) model follows from the characteristic property, by stopping the series 2.4.2 after $q$ terms:
3.12.4

	$\displaystyle S(f)$	$\displaystyle=$	$\displaystyle 2\sigma_{x}^{2}\left\{1+2\sum_{k=1}^{q}\rho_{k}\cos 2\pi kf\right\}$
		$\displaystyle=$	$\displaystyle 2\sigma_{x}^{2}\left\{1+2\left[\rho_{1}\cos 2\pi f+\rho_{2}\cos 2% \pi 2f+\cdots+\rho_{q}\cos 2\pi qf\right]\right\}.$

Now the range of frequency $0\leq f\leq{\scriptscriptstyle\frac{1}{2}}$ corresponds monotonically to the range $-1\leq c\leq 1$ for $c=-\cos 2\pi f$ . It is useful to think of plotting the spectrum against $c$ ; it is only a slight distortion of the horizontal axis. In that case $\cos 2\pi kf$ is a polynomial of degree $k$ in $c$ , so that for the MA( $q$ ) model the spectrum can be considered as a polynomial of degree $q$ . Any such polynomial is a valid spectrum provided it is non-negative.

The statistical properties of the sample spectrum for a stationary time series generalise those stated earlier for white noise. They are that for large n
3.12.5

S^{\star}(f)\sim\mbox{Exponential}\left\{S(f)\right\}

with values at frequencies separated by more than $\frac{1}{n}$ being approximately independent. The sample spectra that were shown in Figures 4 and 5 can be smoothed, as shown in Figure 22, to estimate the true spectra about which the sample spectra vary with an exponential distribution.

Figure 22: First Link, Second Link, Caption: Smoothed sample spectrum of the Monthly CO2 residuals and the random Gaussian series.

For selection of ARMA models, the sample spectrum is of most value in confirming AR features of the model by the presence of spectrum peaks. For example a peak at $f=0$ indicates at least a first order model with $p=1$ . If in addition there is a peak within the range $0<f<{\scriptstyle\frac{1}{2}}$ then another two AR terms are needed giving $p=3$ . The general form of the spectrum for an ARMA( $p, q$ ) model is that of the ratio of two polynomials in $c=-\cos 2\pi f$ . The numerator of order $q$ depends on the MA part, and the denominator of order $p$ on the AR part. Rational polynomials such as these have great flexibility in approximating continuous functions and this is one reason why the ARMA model is able to fit observed stationary time series properties so well.

The following three figures show the plot for ocean core data, and properties of a fitted AR(6) model.

Unnumbered Figure: Link