1 Introduction and Regression models.

1.1 Time series structure.

1.1.1 Definition. A time series is a set of measurements of a variable, say Y, recorded as time progresses.

We shall assume that these are obtained at equally spaced points in time, to give a sample - where the data is collectively called a discrete time series:

y1,y2,,yn.

We will denote t as the time index. The adjective discrete applies to the time, such that t, not to the numerical values of Y, which in this course can take any real value such that yt.

Objectives of time series analysis are

  • to describe or model the structure of the data and estimate the model parameters,

  • to use the model for forecasting, simulation, decision making.

A statistical approach is needed in order to reveal and model the structure. We therefore view data yt as values of random variables Yt so that the sequence {Yt} is properly known as a stochastic process. Although there are many sophisticated models for such processes, the limitations of the data usually mean that relatively simple linear models are used most of the time, and these are the subject of this course.

In practice we use yt both for the data and for the random variables, with the context being sufficient to distinguish the meaning, e.g. 1nyt is usually a ‘value’ - a number, but in E(yt) we mean yt to be taken as a random variable.

Figures 1 and 2 show a selection of observed series.

Figure 1: Link, Caption: An environmental time series and a medical time series
Figure 2: First Link, Second Link, Caption: An economic time series and three financial time series

The visually evident features or structure in a time series are usually described as trends, seasonality and cycles. We start by presenting deterministic models for these features, and fit them using linear regression. We shall later consider models which allow for the fact that these features are not always regular.

1.2 Simple Linear Regression models for time series.

Deterministic functions of time t are fitted to the data by ordinary least squares regression (OLS). Provided that the residuals of the fit are small the data may reasonably be predicted by extrapolating the fitted deterministic components.

However, the statistical assumption underlying OLS is that the model errors satisfy the following conditions which in a time series context have a specific name.

1.2.1 Definition. A series et is said to be white noise if

  1. (i)

    E(et)=0 for all t,

  2. (ii)

    Var(et)=σe2 is constant for all t,

  3. (iii)

    Cov(et,es)=0 for all st.

These conditions are commonly made more stringent by requiring that the et are independent and identically distributed (IID) and even further, that they are Normally distributed, etNID(0,σe2).

1.2.2 Polynomial trend models. Commonly used are:

constant levelyt=c+etlinear trendyt=c+bt+etquadratic trendyt=c+bt+dt2+et

where the unknown coefficients c,b,d are to be estimated. It is sometimes useful to write them in different but equivalent forms, e.g for the linear trend with t¯=(n+1)/2 and m=c+b(n+1)/2

yt=m+b(t-t¯)+et

in which case the estimates are given by the relatively simple expressions
1.2.3

m^=y¯=1nt=1nyt;b^=t=1nyt(t-t¯)t=1n(t-t¯)2.

Trend models may be more appropriately fitted to some series after applying a logarithmic transformation which converts exponential growth to linear. Such transformations may also have the benefit of making the error variance more homogeneous throught the series. The square root and other power transformations are also used.

1.2.4 Seasonal regression models. A seasonal component of a series is one which repeats itself at regular intervals of integer period s. This may be in addition to a trend, e.g. a monthly series, for which s=12, might be modelled by
1.2.5

yt=c+bt+{α1in each Januaryα2in each Februaryα12in each December+et

The coefficients α1α12 are fixed monthly effects. We say that month is a factor with 12 levels. Using dummy variables this is a linear regression model
1.2.6

yt=c+bt+Mt,1α1++Mt,12α12+et

where
1.2.7

Mt,1={1in each January0in other months}Mt,12={1in each December0in other months}

If the constant term c is in the model, one of the dummy variables must be removed to avoid collinearity - sometimes called aliasing.

Figure 3 shows the fit of this to the CO2 series, together with the residuals from the fit. Note that these are far from being IID - there are long runs of positive, or negative, values.

Figure 3: First Link, Second Link, Caption: Fit to Monthly CO2 of the model with trend and seasonal factor.