TEMPORARY_DOCUMENT_ID 2 Descriptive statistics for stationary time series.

1 Introduction and Regression models.

1.1 Time series structure.

1.1.1 Definition. A time series is a set of measurements of a variable, say $Y$ , recorded as time progresses.

We shall assume that these are obtained at equally spaced points in time, to give a sample - where the data is collectively called a discrete time series:

y_{1},y_{2},\ldots,y_{n}.

We will denote $t$ as the time index. The adjective discrete applies to the time, such that $t\in\mathbb{Z}$ , not to the numerical values of $Y$ , which in this course can take any real value such that $y_{t}\in\mathbb{R}$ .

Objectives of time series analysis are

•

to describe or model the structure of the data and estimate the model parameters,
•

to use the model for forecasting, simulation, decision making.

A statistical approach is needed in order to reveal and model the structure. We therefore view data $y_{t}$ as values of random variables $Y_{t}$ so that the sequence $\{Y_{t}\}$ is properly known as a stochastic process. Although there are many sophisticated models for such processes, the limitations of the data usually mean that relatively simple linear models are used most of the time, and these are the subject of this course.

In practice we use $y_{t}$ both for the data and for the random variables, with the context being sufficient to distinguish the meaning, e.g. $\frac{1}{n}\sum y_{t}$ is usually a ‘value’ - a number, but in $\mbox{E}(y_{t})$ we mean $y_{t}$ to be taken as a random variable.

Figures 1 and 2 show a selection of observed series.

Figure 1: Link, Caption: An environmental time series and a medical time series

Figure 2: First Link, Second Link, Caption: An economic time series and three financial time series

The visually evident features or structure in a time series are usually described as trends, seasonality and cycles. We start by presenting deterministic models for these features, and fit them using linear regression. We shall later consider models which allow for the fact that these features are not always regular.

1.2 Simple Linear Regression models for time series.

Deterministic functions of time $t$ are fitted to the data by ordinary least squares regression (OLS). Provided that the residuals of the fit are small the data may reasonably be predicted by extrapolating the fitted deterministic components.

However, the statistical assumption underlying OLS is that the model errors satisfy the following conditions which in a time series context have a specific name.

1.2.1 Definition. A series $e_{t}$ is said to be white noise if

(i)

E $(e_{t})=0$ for all $t$ ,
(ii)

Var $(e_{t})=\sigma_{e}^{2}$ is constant for all $t$ ,
(iii)

Cov $(e_{t},e_{s})=0$ for all $s\neq t$ .

These conditions are commonly made more stringent by requiring that the $e_{t}$ are independent and identically distributed (IID) and even further, that they are Normally distributed, $e_{t}\sim\mbox{NID}(0,\sigma_{e}^{2})$ .

1.2.2 Polynomial trend models. Commonly used are:

\begin{array}[]{ll}\mbox{constant level}&y_{t}=c+e_{t}\\ \mbox{linear trend}&y_{t}=c+bt+e_{t}\\ \mbox{quadratic trend}&y_{t}=c+bt+dt^{2}+e_{t}\end{array}

where the unknown coefficients $c, b, d$ are to be estimated. It is sometimes useful to write them in different but equivalent forms, e.g for the linear trend with $\bar{t}=(n+1)/2$ and $m=c+b(n+1)/2$

y_{t}=m+b(t-\bar{t})+e_{t}

in which case the estimates are given by the relatively simple expressions
1.2.3

\hat{m}=\bar{y}=\frac{1}{n}\sum_{t=1}^{n}y_{t}\ ;\ \hat{b}=\frac{\sum_{t=1}^{n% }y_{t}(t-\bar{t})}{\sum_{t=1}^{n}(t-\bar{t})^{2}}.

Trend models may be more appropriately fitted to some series after applying a logarithmic transformation which converts exponential growth to linear. Such transformations may also have the benefit of making the error variance more homogeneous throught the series. The square root and other power transformations are also used.

1.2.4 Seasonal regression models. A seasonal component of a series is one which repeats itself at regular intervals of integer period $s$ . This may be in addition to a trend, e.g. a monthly series, for which $s=12$ , might be modelled by
1.2.5

y_{t}=c+bt+\left\{\begin{array}[]{ll}\alpha_{1}&\mbox{in each January}\\ \alpha_{2}&\mbox{in each February}\\ \vdots&\vdots\\ \alpha_{12}&\mbox{in each December}\end{array}\right.+e_{t}

The coefficients $\alpha_{1}\ldots\alpha_{12}$ are fixed monthly effects. We say that month is a factor with $12$ levels. Using dummy variables this is a linear regression model
1.2.6

y_{t}=c+bt+M_{t,1}\alpha_{1}+\cdots+M_{t,12}\alpha_{12}+e_{t}

where
1.2.7

M_{t,1}=\left\{\begin{array}[]{ll}1&\mbox{in each January}\\ 0&\mbox{in other months}\end{array}\right\}\ \cdots\ M_{t,12}=\left\{\begin{% array}[]{ll}1&\mbox{in each December}\\ 0&\mbox{in other months}\end{array}\right\}

If the constant term $c$ is in the model, one of the dummy variables must be removed to avoid collinearity - sometimes called aliasing.

Figure 3 shows the fit of this to the CO2 series, together with the residuals from the fit. Note that these are far from being IID - there are long runs of positive, or negative, values.

Figure 3: First Link, Second Link, Caption: Fit to Monthly CO2 of the model with trend and seasonal factor.