1.1.1 Definition. A time series is a set of measurements of a variable, say , recorded as time progresses.
We shall assume that these are obtained at equally spaced points in time, to give a sample - where the data is collectively called a discrete time series:
We will denote as the time index. The adjective discrete applies to the time, such that , not to the numerical values of , which in this course can take any real value such that .
Objectives of time series analysis are
to describe or model the structure of the data and estimate the model parameters,
to use the model for forecasting, simulation, decision making.
A statistical approach is needed in order to reveal and model the structure. We therefore view data as values of random variables so that the sequence is properly known as a stochastic process. Although there are many sophisticated models for such processes, the limitations of the data usually mean that relatively simple linear models are used most of the time, and these are the subject of this course.
In practice we use both for the data and for the random variables, with the context being sufficient to distinguish the meaning, e.g. is usually a ‘value’ - a number, but in we mean to be taken as a random variable.
Figures 1 and 2 show a selection of observed series.
The visually evident features or structure in a time series are usually described as trends, seasonality and cycles. We start by presenting deterministic models for these features, and fit them using linear regression. We shall later consider models which allow for the fact that these features are not always regular.
Deterministic functions of time are fitted to the data by ordinary least squares regression (OLS). Provided that the residuals of the fit are small the data may reasonably be predicted by extrapolating the fitted deterministic components.
However, the statistical assumption underlying OLS is that the model errors satisfy the following conditions which in a time series context have a specific name.
1.2.1 Definition. A series is said to be white noise if
E for all ,
Var is constant for all ,
Cov for all .
These conditions are commonly made more stringent by requiring that the are independent and identically distributed (IID) and even further, that they are Normally distributed, .
1.2.2 Polynomial trend models. Commonly used are:
where the unknown coefficients are to be estimated. It is sometimes useful to write them in different but equivalent forms, e.g for the linear trend with and
in which case the estimates are given by the relatively simple expressions
1.2.3
Trend models may be more appropriately fitted to some series after applying a logarithmic transformation which converts exponential growth to linear. Such transformations may also have the benefit of making the error variance more homogeneous throught the series. The square root and other power transformations are also used.
1.2.4 Seasonal regression models. A seasonal component of a series is
one which repeats itself at regular intervals of integer period .
This may be in addition to a trend, e.g. a monthly series, for which ,
might be modelled by
1.2.5
The coefficients are fixed monthly effects. We say
that month is a factor with levels. Using dummy variables this is a
linear regression model
1.2.6
where
1.2.7
If the constant term is in the model, one of the dummy variables must be removed to avoid collinearity - sometimes called aliasing.
Figure 3 shows the fit of this to the CO2 series, together with the residuals from the fit. Note that these are far from being IID - there are long runs of positive, or negative, values.