5.3 Example: Poisson Regression 6.1 Elements of linear algebra

Chapter 6 Linear predictor and model formula

We have already used model formula in some of the examples using R. Here we provide a fuller explanation in terms of linear algebra.

An experiment has a response variable $Y$ , measured on each of $n$ units, giving the observed response vector $\mathbf{y}$ , with associated explanatory variables.

The design matrix ${X}$ codes the information about how the explanatory variables are assigned to the units of the study. The explanatory variables may be treatments, with fixed values or covariates, that report the value of a potentially related variable.

The explanatory variables are related to $Y$ by taking the linear predictor to be linear combinations of the columns of the design matrix

\boldsymbol{\eta}={X}\boldsymbol{\beta}.

For example, an experiment with 6 units compares two treatments in the presence of a possible confounding variable, $x$ . Treatment 1 is applied to units 2, 5, 6 and treatment 2 to units 1, 3, 4. One way to code this information is

\begin{array}[]{ r ccc }\mbox{ unit }&\mathbf{t}_{1}&\mathbf{t}_{2}&\mathbf{x}% \\ 1&0&1&.1\\ 2&1&0&.2\\ 3&0&1&.3\\ 4&0&1&.4\\ 5&1&0&.5\\ 6&1&0&.6\\ \end{array}\quad\mathrm{and}\quad{X}=\left[\begin{array}[]{ ccc }0&1&.1\\ 1&0&.2\\ 0&1&.3\\ 0&1&.4\\ 1&0&.5\\ 1&0&.6\\ \end{array}\right]

is the design matrix. The vectors $\mathbf{t}_{1}$ , $\mathbf{t}_{2}$ are indicator vectors, they and $\mathbf{x}$ are vectors in 6-dimensional space.

A model that relates the mean value of the response variable $\mathbb{E}[Y]$ linearly to these three variables is

\mathbb{E}[\mathbf{Y}]=X\boldsymbol{\beta}=\beta_{1}\mathbf{t}_{1}+\beta_{2}% \mathbf{t}_{2}+\beta_{3}\mathbf{x},

which when written out for each unit are:

$\displaystyle\mathbb{E}[Y_{1}]$	$\displaystyle=$	$\displaystyle 0\beta_{1}+1\beta_{2}+.1\beta_{3}$
$\displaystyle\mathbb{E}[Y_{2}]$	$\displaystyle=$	$\displaystyle 1\beta_{1}+0\beta_{2}+.2\beta_{3}$
	$\displaystyle\vdots$
$\displaystyle\mathbb{E}[Y_{6}]$	$\displaystyle=$	$\displaystyle 1\beta_{1}+0\beta_{2}+.6\beta_{3}$

This specification of the linear combination is sensible, but not unique. For example, if $\mathbf{1}$ is the vector containing 6 ones, then we may transform to

\displaystyle\mathbb{E}[\mathbf{Y}]=\beta_{1}\mathbf{1}+(\beta_{2}-\beta_{1})% \mathbf{t}_{2}+\beta_{3}\mathbf{x}=\alpha_{1}\mathbf{1}+\alpha_{2}\mathbf{t}_{% 2}+\alpha_{3}\mathbf{x},

and the coefficient of $\mathbf{t}_{2}$ now measures the difference between the two treatment effects.

6.1 Elements of linear algebra

6.2 Model formulae for continuous variables

6.3 Factors for categorical variables

6.4 Interaction