Chapter 6 Linear predictor and model formula

We have already used model formula in some of the examples using R. Here we provide a fuller explanation in terms of linear algebra.

An experiment has a response variable Y, measured on each of n units, giving the observed response vector 𝐲, with associated explanatory variables.

The design matrix X codes the information about how the explanatory variables are assigned to the units of the study. The explanatory variables may be treatments, with fixed values or covariates, that report the value of a potentially related variable.

The explanatory variables are related to Y by taking the linear predictor to be linear combinations of the columns of the design matrix

𝜼=X𝜷.

For example, an experiment with 6 units compares two treatments in the presence of a possible confounding variable, x. Treatment 1 is applied to units 2, 5, 6 and treatment 2 to units 1, 3, 4. One way to code this information is

 unit 𝐭1𝐭2𝐱101.1210.2301.3401.4510.5610.6andX=[01.110.201.301.410.510.6]

is the design matrix. The vectors 𝐭1, 𝐭2 are indicator vectors, they and 𝐱 are vectors in 6-dimensional space.

A model that relates the mean value of the response variable 𝔼[Y] linearly to these three variables is

𝔼[𝐘]=X𝜷=β1𝐭1+β2𝐭2+β3𝐱,

which when written out for each unit are:

𝔼[Y1] = 0β1+1β2+.1β3
𝔼[Y2] = 1β1+0β2+.2β3
𝔼[Y6] = 1β1+0β2+.6β3

This specification of the linear combination is sensible, but not unique. For example, if 𝟏 is the vector containing 6 ones, then we may transform to

𝔼[𝐘]=β1𝟏+(β2-β1)𝐭2+β3𝐱=α1𝟏+α2𝐭2+α3𝐱,

and the coefficient of 𝐭2 now measures the difference between the two treatment effects.