2.1.1 Introduction. Many time series are evidently not IID - they may have long runs of values of the same sign, or show other patterns of association between successive values. Yet they often show a statistical similarity of appearance throughout their sample length. This may be after trends, seasonality and strong cyclical features have been removed by regression.
Differencing a time series, i.e. looking at the change from one sample point to another, may also lead to such an appearance, an example being inflation levels - monthly price increases - which are generally of more concern than the absolute price which continues its trend inexorably.
In this section we shall make simple assumptions about the structure of a single, or univariate time series so we shall now use the notation rather than which was more appropriate for the regression context.
The assumption that the statistical behaviour of a series is not changing as time progresses is called stationarity. It’s simplest description or measure uses the correlations between values in the series and leads to the following:
2.1.2 Definition. A time series is second order (weakly) stationary if
E and Var are the same for all .
For each Cov are the same for all .
Note that . If we also consider negative lags
The set of values is called the autocovariance function (of the lag) for the series .
Note: strict stationarity is that the joint pdf of any set of values of the series is the same as if they were all shifted in time by the same lag.
Also, a series is said to be Gaussian if this joint pdf is Normal. The assumption of second order stationarity and Gaussianity implies strict stationarity.
2.1.3 Definition. The autocorrelation function (acf) of a stationary time series is
Again is at times useful to think of being defined also for negative such that . Obviously always.
2.1.4 Examples.
(a) If is white noise then by definition for
(b) Take to be white noise and consider the constructed series . Then
It is similarly found that for so:
Given a time series data set the following estimates of the above quantities are used. Note however that they differ from the usual sample estimates used in statistics because they are not based on independent observations.
2.1.5 Time series sample estimates.
sample mean, an estimate of :
sample variance, an estimate of :
sample autocovariance, an estimate of : for and 0 if .
sample autocorrelation, an estimate of :
Remark. The divisor of is used for defining and . Some authors use the divisor when defining .
Warning. These sample quantities can be automatically generated for any data set so the assumption of stationarity requires some check - usually a visual inspection of the series - before they are used.
In particular if the data contain trends, seasonality or cycles which appear deterministic and have not been removed by regression, then they will affect, and tend to dominate, the pattern of the sample acf, obscuring other statistical features. However, series may appear trend-like in a short sample yet stationary in the long term. The appearance depends on sample length. Figure 4 shows the sample autocorrelations of the residuals from the harmonic model fitted to the CO2 series. The figure also shows the series, the partial autocorrelations, and the sample spectral density (known as the periodogram), which we shall define later in this course. Figure 5 shows similar plots for the random series.
Consider successive time series values as a vector of random variables
where indicates transpose. For stationary series, the autocorrelation function determines their
correlation matrix as
2.2.1
and the covariance matrix is defined by .
The matrices and have special structure: the elements are the same down any diagonal. Such matrices are called Toeplitz matrices.
Covariance matrices are useful for calculating the variance of a linear
combination of variables such as
where , as
2.2.2
A simple example is to evaluate Var by taking . The result is just
multiplied by the sum of the elements of :
2.2.3
We can deduce
For large this may be approximated by
2.2.4
A similar formula holds for the sample variance if the series is Gaussian, such that
2.2.5
Remark. For highly autocorrelated series precise estimation of the mean and variance is very difficult because of the high magnitude of and appearing in the variances.
The autocorrelation measures the direct relationship between two values of
a time series at different lags. In time series an alternative measure
is the strength of association conditional upon the values in between.
2.3.1 Definition. The partial autocorrelation (pacf) at lag for a
stationary time series is
This may be calculated from the covariance matrix of and so it depends only on . The same quantities are in fact used when the Gaussian assumption is not appropriate, and we shall see later their value and interpretation in the context of model selection.
2.3.2 The sample partial autocorrelations.
Sample values of the pacf are calculated simply by using
the sample values in place of the acf in the calculations.
A similar warning to that given regarding sample autocorrelations also applies here, that any strong trends and other deterministic components not removed from the series will tend to dominate the appearance of the sample pacf.