In this chapter we show how autocorrelation patterns can arise from models based on white noise. These models can readily be fitted to times series data, and used for a variety of applications, particularly forecasting.
3.1.1 Example: The MA() model. Let
where and is white noise. The reason for the sign will beome evident.
The properties of this model are
Proof as an exercise.
Remarks. , follows from . Also
so that replacing by would result in the same acf. Provided we therefore have a choice of values for when modelling an acf of this form. The convention is to choose the value that satisfies .
3.1.2 Definition. A time series follows a moving average model of order - MA() if it may be represented as
where is a white noise series.
Remarks. The model has moving average parameters and one
variance parameter . For some purposes it is useful to have a
slightly different notation and to write
3.1.3
where and for . The name moving average is more usually applied when one observed time series takes the place of in this equation, and a new series, taking the place of , is constructed according to this equation. Our model contrasts with this in that is observed, and we are proposing a model which gives rise to its autocorrelation pattern.
This is a convenient point to introduce an operator notation for discrete time series.
3.1.4 Definition. The backward shift operator applied to a series replaces it by the shifted series with values . Powers of are similarly defined, i.e.
Using this notation the MA() model may be expressed from Definition 3.1.2 as
3.1.5
For example the MA() model is
Now we can formally re-arrange this, and provided expand:
We can directly verify from this expression that .
The fact that the unobserved series can be ‘recovered’ from the observed series is the reason for choosing . The general result is:
3.1.6 Definition. The MA() model of 3.1.5 is said to be invertible if it is possible to express in terms of the present and past values of as
Remark. The signs in this last equation are chosen for convenience on
rewriting the expansion as
3.1.7
- a form of the model which will be useful for prediction.
3.1.8 Definition. The invertibility condition for model 3.1.5
is that the operator satisfies
Here, we are treating as a real or complex number rather than an operator.
Remark. Consider a MA(2) model where we observe satisfying .
The roots of are given by and
so that is not invertible.
The invertibility condition means that it is possible to expand
with in order to derive 3.1.6 from 3.1.5.
One way to derive the coefficient of is to use partial fractions which is standard in power-series expansion.
We now consider when it is appropriate to use a moving average model to
represent a given series. This is based on the autocorrelation properties of the
model:
3.2.1 Recall that so that , , . The autocovariances of a moving average model are given in terms of the
coefficients as
The Characteristic property of the model is that
3.2.2
Continuing the previous example, taking , we get
so that and .
of 3.2.1. Start from
Now each term in the first sum is correlated with at most one term in the second sum, i.e. the same term which is with . So the covariance reduces to
∎
Notice that the above formula holds even if . In other words, if
where is white noise, then
There is a converse result, that any stationary time series having the property 3.2.2 may be represented by a moving average model 3.1.2. satisfying the invertibility condition.
The characteristic property is recognised by inspection of the sample acf of the data series. The sampling properties of this acf are therefore required, and these will be given shortly. But first have a look at two time series for which Moving Average models appear to be appropriate. In fact it is the successive changes, or first differences, of these series which appear to have the characteristics of Moving Average models.
Consider the ‘CABLE’ data, available on Moodle, which is clearly non-stationary. A good model for this data might be of the form where is stationary. We therefore take the logarithm and then the first differences of the series of weekly business transactions as shown in Figure 6 to make it stationary.
Figure 7 shows the sample time series properties of the differenced series, which are typical of an MA(1) model:
The model is estimated for this series, and it has the model properties shown in the following figures.
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
The second plot shows the implied weights of the MA representation of 3.1.3, and the third plot shows the implied weights of the MA representation given in 3.1.7. The fourth plot shows the implied model spectrum which we shall define in Section 3.12.
The second example is of the annual changes in daylength, measured in milliseconds, from the dataset ‘DAYLENGT’ available in Moodle. Part of what we see is almost certainly due to the fact that the original data has been smoothed to give the series shown in the left panel of Figure 9. Each point has been replaced by its average with the previous and next point. And this procedure has then been repeated. This procedure has the property of completely annihilating any cycle of period 3.
Figure 10 shows the sample time series properties of the differenced series. Among these, the sample spectrum is very low at frequency 1/3 as a consequence of the smoothing. The sample autocorrelations are characteristic of an MA(3) model:
The MA(3) model is estimated for this series, and it has the model properties shown in the following figures, which match the sample properties very well:
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
In general the sample value is an approximately unbiased and consistent estimate of for any fixed lag as the sample size tends to infinity. In the particular case of the MA() model there is a useful result for the standard error of the sample acf for lags greater than the order. This helps us to weigh the statistical evidence for a hypothesised value of q.
3.2.3 For a Gaussian time series of length with acf satisfying for
The simplest case, , gives the property under the hypothesis of white noise, that
so it is usual to plot the sample correlations on a graph with approximate error limits drawn at about the horizontal axis. If the sample acf values generally lie within these limits with the only values outside them being at unremarkable lags then white noise is a sensible conclusion. But these only apply for assessing the hypothesis of white noise. Under any other hypothesis, the limits are wider.
The sample acf may in fact lie outside these limits for many lags. We are here interested in the case when a small number of values lie clearly outside the limits at low lags. We then have good reason to suppose that a MA() model is appropriate with the value of being tentatively chosen as the ‘cut-off’ lag - the highest lag with a ‘significant’ acf value.
We need then to be aware of the fact that the sampling variability of the remaining acf values, at lags greater than the assumed MA order will be increased according to 3.2.3, so that rather more values will lie outside the white noise limits. The increased may be estimated from 3.2.3 using the sample acf values up to lag in place of the unknown .
Another visual characteristic of the sample acf at these higher lags is that their statistical variablity may follow a smoother pattern than for white noise. Even when the true model is MA() these patterns may be deceptively suggestive of structure in the acf, i.e. non zero values of , which is not actually present at these higher lags.
Estimation of the parameters in a MA() model may sometimes be achieved by solving the equations 3.2.1 for the unknown parameters and using sample autocovariance values in place of . Unfortunately these equations may not have a real solution, though when they do it is easily found. For example in the MA() case the solution for is found by solving
which has no real solution if is outside the range , and this may well happen through statistical variability in .
Consider the model
3.3.1
which defines successive values of the series in terms of previous values and a white noise series . This is the first order autoregessive model, commonly referred to as the AR(1) model. The essential distinction with prediction models is that for the prediction equation is only required to be uncorrelated with - in general it will not be white noise. For the AR(1) model however, is assumed to be white noise and will, as a result of that, be independent of .
The properties of this model may be found by successively substituting
Now assuming that both and is bounded ‘in the infinite past’ we
can let and obtain the convergent sum:
3.3.2
We can derive from this, for example, E()= and
The representation 3.3.2 could in fact be obtained using the backward shift operator notation by writing 3.3.1 as so that
The variance and correlation properties can be derived directly from the model 3.3.1 without using this expansion. Because and are uncorrelated we get
Similarly is uncorrelated with for any so
It follows that
Note therefore that and for .
3.3.3 Definition. A time series follows an autoregressive model of order - AR(p), if it is generated by the process:
where is a white noise series. Using operator notation the model may be
written:
3.3.4
We also require:
3.3.5 Definition. The stationarity condition for model 3.3.3 is
that the operator satisfies
Remark. The condition is the same as that required of for a moving average model to be invertible (see 3.1.8). In this case it means that it is possible to carry out the convergent expansion following from 3.3.4:
where . This is an infinite moving average
representation of the model (compare with the finite represenation of 3.1.3 for MA processes). We note one consequence which may be
obvious from the way the series is generated, but which we can formally
verify with this expansion; the fact that is uncorrelated with all
past values of the series:
3.3.6
This is because in which all terms are uncorrelated with .
The main contrast with moving average models is that the acf does not have a cut off, but generally decays towards higher lags. The pattern of decay is obviously reflected in the behaviour of the time series generated from this model, and includes cyclical patterns. Historically, this was a reason for the introduction of these models.
3.4.1 The Yule - Walker equations relating the coefficients and acf of the AR() model.
The coefficients determine, and are determined by, the acf values , by the equations
Displayed as a set of linear equations for these are
where for simplicity the subscript of has been omitted. We estimate the parameters by replacing the acf by sample acf .
We demonstrate the more general result:
3.4.2
The Yule-Walker equations are just the subset of these for .
From 3.3.6 and for
Divide through by for the required result. ∎
These equations are completed by one further useful relationship:
3.4.3
In two steps. The first is that
because is uncorrelated with all past . Extending this;
Divide by for the result. ∎
Figure 12 shows a good example of a time series that resembles a first order autoregressive process. It is a series of temperature measurements modified by wind speed and other variables for use in predicting energy demands. The sample properties, also shown in Figure 12, are typical of this process.
The model fitted to this series was . The properties of the model, presented in the next figures, correspond very well to the sample properties of the series.
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
3.4.4 Example. The AR() model:
This process has the capacity to represent a series with an irregular cycle
and was put forward by Yule to model the sunspot cycle. The motivation was
a discrete time version of the second order differential equation for a
damped pendulum, including random fluctuations. For the Yule-Walker
equations give:
3.4.5
If the model parameters are given, these equations supply . Then 3.4.2 generates successively using .
Figure 14 shows a series which appears to follow an AR(2) model. It is an indicator of past ocean surface temperatures, obtained by measuring oxygen isotope ratios in a core from the ocean sediment. The time scale is in thousands of years. The series has an irregular cyclical appearance, reflected in its sample autocorrelations.
The coefficients of the fitted AR(2) model were found to be and . The next five figures show the properties of the fitted model, which are similar to those of the sample properties.
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
We began the previous section by saying that the autocorrelation properties of autoregressive models are generally quite distinct from those of moving average models. However we also have
3.5.1 The characteristic property of the AR() model is that its partial autocorrelation function has a ‘cut-off’ at lag , i.e.
This is because conditioning upon for any includes . If lagged terms have been included in the predictor, the error is white noise and uncorrelated with all terms at higher lags than . Including these further terms cannot improve the predictor; this implies that the partial correlation at these lags is zero.
Again there is a converse result, that any stationary time series satisfying the property 3.5.1 may be represented by the AR() model 3.3.4 with being a white noise series, and satisfying the stationarity condition.
The characteristic property is recognised by inspection of the sample pacf of the data series. The sampling properties of are that they are approximately unbiased and consistent estimates of for any fixed lag as the sample size tends to infinity. When the series follows the AR() model the following sampling property holds:
3.5.2
It is therefore usual to plot the sample pacf on a graph with approximate error limits for zero pacf values of about the horizontal axis. We are interested in the case when a small number of values lie clearly outside the limits at low lags. The highest lag with a significant pacf value then indicates the order .
Remark. Unlike the acf which may extend to high lags before decaying effectively to zero, the pacf for most practically occurring series does not extend to very high lags. This is because
measures the ability to predict the series from past values, which is usually rather limited. The first few pacf values usually ‘take up’ most of this. So in practice series which do not in actual fact follow an AR model may be well approximated by one of modest order. The true pacf may not actually ‘cut-off’ at some lag, but may just become small compared with the SE of the sample value.
3.6.1 Definition. A time series follows an autoregressive moving average model of orders p and q, an ARMA() model, if it is generated by the process
where is a white noise series. Using operator notation the model may be written
where the operators and are as defined for the pure AR and MA models.
We again require conditions
for stationarity for
for invertibility for .
The first ensures that we can express as an infinite moving
average form:
3.6.2
and the second that we can recover from by
which is simply rearranged to give the infinite autoregressive form
3.6.3
Remark. The existence of these infinite forms means that in theory an ARMA process can be approximated with arbitrary accuracy by either a pure MA or AR model of finite order. Given a finite data series this can lead to some uncertainty as to which model to use.
3.6.4 Example. The ARMA(1,1) model as defined in 3.6.3 has
so that . Similarly
so that .
Remark. Note that in this example if then simply , because and cancel. In general we require that and have no cancelling factors, else the model would not be distinguishable from a simpler one. The problem can come with model estimation from finite data series when near-cancelling factors can occur. The signal plus noise model can give this situation if the noise almost swamps the signal.
Figure 16 in the next page shows a series of central England average temperatures, that plausibly follows such a model. The coefficients of the fitted ARMA(1,1) model were found to be and .
The properties of the fitted model, shown in the next six figures, are similar to those of the sample properties. The last plot shows a smoothed pattern of the series, estimating the underlying level which might be represented by an AR(1) process.
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
The first point to make is that in general neither of the characteristic properties of the MA or AR model holds for ARMA models. This is a negative statement but nevertheless of value in suggesting that a mixed ARMA model is needed.
A second point is that even if it appears that a pure AR or MA model is appropriate from recognition of one characteristic property or the other, a model can sometimes be improved by adding a MA term to a pure AR model or vice versa. This is because inspection of the sample acf and pacf, though providing an indication of the basic model, may ‘lose’ other indications in the sampling fluctuations.
The main concern though is that there may be no simple guide as to the
choice of the model orders which are needed in order to estimate the
model parameters. The acf properties of the model are considered first.
These may be derived from the infinite moving average form 3.6.5 by letting
in 3.2.1:
3.7.1
These infinite sums can be avoided by solving appropriate equations similar
to the Yule-Walker equations, although the infinite moving average is initially
useful for deriving:
3.7.2
3.7.3 Example. The ARMA() model has variance and autocorrelations given by
Note that the pattern of autocorrelations is similar to that for the AR() in the sense that both exhibit geometric decay of the autocorrelation as a function of the lag .
Proof. We use to derive the first of the equations:
The first two can be solved for and as stated, and the third shows that .
For the general ARMA model the decay pattern of the acf is similar to that
for the AR model with the same autoregressive part, but it is modified
by the presence of the moving average part to allow greater flexibility of
form. The pattern is determined by the equation relating successive acf
values for the general ARMA model:
3.7.4
which derives directly from the model:
This pattern of decay can help to indicate the appropriate AR order . For example cyclical delay indicates that should be at least . The modification at low lags to the smooth pattern of decay associated with the pure AR model can also help to indicate the MA order , as the ARMA() example illustrates. For the ARMA() model the geometric decay of the acf starts from lag .
The other statistical tools available for selecting the orders of an ARMA model are the sample pacf and sample spectrum. The sample pacf is not of great value for either the MA or ARMA model, but its pattern is generally similar to that of the coefficients which appear in the infinite AR representation of these models. For the ARMA() example this pattern is approximately a geometric decay by the factor from the lag value of the pacf.
The remaining sections in this Chapter (3.8–3.12) are not examinable but will be needed for your project. We will discuss these sections after covering Chapters 4–5.
A suitable model for a smooth cycle is
3.8.1
where
The Amplitude of the cycle is
The Frequency in cycles per unit time (that is over ) is where .
The Phase, that is the fraction of a cycle by which it lags, is , .
We restrict , and .
We shall sometimes write using the angular frequency of radians per unit time and angular phase - the notation is more convenient at times.
Rather than estimate and directly we use
to rewrite the model as
3.8.2
where and So if is known this is a linear regression model with coefficients and .
Sometimes the fit to a smooth cycle is improved by adding further cycles
known as harmonics because their frequencies are multiples of the
fundamental frequency , e.g.
Thus cycles which are asymmetric, with a slow rise and rapid fall, or sharp peaks with flat troughs, may be well fitted by including sufficient harmonics provided the pattern repeats itself periodically.
We have previously used the seasonal factor model
to represent a periodic
component of a monthly series based on a nonsmooth model. An equivalent smooth model is one which treats this
component in terms of cycles of period and uses instead as regression variables the
set of sinusoids defined for by
3.9.1
where is the frequency of the th harmonic of the fundamental frequency . For a more general period the pairs of cosine and sine regressors are present for when is even. When is odd there are only such terms. The final, alternating, term is only present when is even. Note that .
The fit to the data using these regressors will be exactly the same as that given by the seasonal factor. The relative advantages arise when it may be possible to obtain a good fit using fewer than the full set of regressors. This can happen if the seasonal effect is confined to just a few months in the year for the factor model, and if the seasonal cycle is very smooth in the case of the harmonic regression. Figure 18 shows the results of fitting the model to the CO2 series, but using just 2 pairs of harmonics. The residual mean square of 0.1496 was slightly lower than the value of 0.1541 obtained using the monthly factor model.
3.10.1 Lemma
Any cycle of frequency when sampled at integer time points is indistinguishable from a cycle with frequency lying in the range .
Proof. Any frequency can be expressed as where is an integer and is unique in the interval . Then for integer
The frequency is called an alias of .
This Lemma leads to the
3.10.2 Convention. Assume that for a discrete regularly sampled time series
the frequency of any cyclical component lies in the range
cycles per sampling interval.
We usually abbreviate this just to . The frequency of is called the Nyquist frequency. Of course the assumption may be wrong - see Michael Faraday’s Royal Society article ‘On a curious class of optical deceptions’. But the convention does ensure that if the sampling interval is sufficiently short the correct interpretation will be made. The possibility that, under this convention, the assumed frequency is an alias of a higher frequency, is a consideration in the choice of sampling interval.
This Lemma explains why no more than six harmonics are used to describe the seasonal pattern, with the highest frequency being .
This is a tool for investigating whether a time series contains cycles, and
for finding their amplitude and frequency.
3.11.1Definition. The periodogram of a time series is
defined for by
Remark. This is motivated by the problem of estimating and in the regression
The least squares equations for and may be approximated, provided or , as
so that
and
3.11.2 Interpretation of the periodogram. There are efficient methods for calculating the periodogram over a fine grid of frequencies spanning . The scaling factor in the definition is chosen so that the area under the graph of is the mean square value of the series:
Usually the series is mean corrected before calculating the periodogram so this becomes the sample variance.
A single cycle of amplitude and frequency will have a periodogram which has a peak at frequency with height close to and width . The main peak will be neighboured by smaller peaks of diminishing height, called side-lobes, at frequencies of .
If the series consists of several such cycles then provided their frequencies are separated by at least , i.e. they differ by at least one cycle in the sample length, then each cycle will usually give rise to a distinct peak at its own frequency. However, if the amplitude of one cycle is much less than another the frequency separation will have to be greater to avoid a sidelobe of the larger peak masking the smaller peak. Figure 19 shows a series consisting of two cycles added together, and their periodogram.
The periodogram of an irregular cycle with varying amplitude and frequency will typically appear as a collection of peaks over a broader frequency range.
A trend in a time series will appear as a peak at frequency zero, often much larger than any other peak, so it is wise to remove any strong trend by regression before examining the periodogram. The plots in the next four figures show (A) the periodogram for the raw (mean corrected) CO2 series, (B) the periodogram after trend correction - now the seasonal cycles are more noticeable, (C) the periodogram of the residuals after fitting the trend and the cycle with period 12 - the next harmonic is now clearer. Finally, (D) shows the periodogram of the residuals after fitting the trend and the harmonics at periods 12 and 6. What is seen is the typical periodogram of a correlated series - but we can also notice some evidence for a harmonic at period 4.
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link
In general the periodogram should be viewed as a transformation of the data. It has particular value for detecting regular cycles, but is also useful for understanding the irregular or random features. An important case is when the series is white noise. Then whatever frequency , using properties of sine and cosine functions, the estimates of the above regression will have the property
so that
where here the parameter is the mean of the exponential distribution with pdf .
Moreover the values at different frequencies are close to being independent random variables provided the frequencies are separated by at least . Because the central of this distribution ranges from about one twentieth to three times its mean, the general picture of the periodogram in this case is very variable with many ‘peaks’. These are just due to natural statistical fluctuation but are easily misinterpreted as revealing regular cycles in the data. In a sufficiently large data set the peak of height of a regular cycle would stand clear of these fluctuations due to noise which have mean . Figure 21 illustrates these points with an artificial series of random Gaussian noise, and its periodogram.
3.12.1 Introduction. The periodogram defined in 3.11.1, for a (mean corrected) time series sample (using now rather than ) may be shown to be related to the sample acf by
Now assuming that is a stationary time series the following definition
is motivated by replacing the sample acf with :
3.12.2 The spectrum of is
We shall call the periodogram the sample spectrum and also write
when it is believed
that the data arise from a stationary time series. The periodogram by
its very definition is non-negative, and the same property may be proved
for the spectrum
3.12.3
The simplest example is that of white noise for which the spectrum has the constant, or uniform, value . In general the spectrum may be thought of as describing how the variance of the series is spread over the frequency range, so that corresponding to 3.10.2 we have
The next simplest example is of a series for which but for , so that
For this example the extreme values of , found at the endpoints and of the range, are and . For the spectrum to be positive both these must be positive so that we obtain the constraint . In general the spectrum can have any positive values over its range.
One expression for the spectrum of an MA() model follows from the
characteristic property, by stopping the series 2.4.2 after terms:
3.12.4
Now the range of frequency corresponds monotonically to the range for . It is useful to think of plotting the spectrum against ; it is only a slight distortion of the horizontal axis. In that case is a polynomial of degree in , so that for the MA() model the spectrum can be considered as a polynomial of degree . Any such polynomial is a valid spectrum provided it is non-negative.
The statistical properties of the sample spectrum for a stationary time
series generalise those stated earlier for white noise. They are that for
large n
3.12.5
with values at frequencies separated by more than being approximately independent. The sample spectra that were shown in Figures 4 and 5 can be smoothed, as shown in Figure 22, to estimate the true spectra about which the sample spectra vary with an exponential distribution.
For selection of ARMA models, the sample spectrum is of most value in confirming AR features of the model by the presence of spectrum peaks. For example a peak at indicates at least a first order model with . If in addition there is a peak within the range then another two AR terms are needed giving . The general form of the spectrum for an ARMA() model is that of the ratio of two polynomials in . The numerator of order depends on the MA part, and the denominator of order on the AR part. Rational polynomials such as these have great flexibility in approximating continuous functions and this is one reason why the ARMA model is able to fit observed stationary time series properties so well.
The following three figures show the plot for ocean core data, and properties of a fitted AR(6) model.
Unnumbered Figure: Link
Unnumbered Figure: Link
Unnumbered Figure: Link