We have already used model formula in some of the examples using R. Here we provide a fuller explanation in terms of linear algebra.
An experiment has a response variable , measured on each of units, giving the observed response vector , with associated explanatory variables.
The design matrix codes the information about how the explanatory variables are assigned to the units of the study. The explanatory variables may be treatments, with fixed values or covariates, that report the value of a potentially related variable.
The explanatory variables are related to by taking the linear predictor to be linear combinations of the columns of the design matrix
For example, an experiment with 6 units compares two treatments in the presence of a possible confounding variable, . Treatment 1 is applied to units 2, 5, 6 and treatment 2 to units 1, 3, 4. One way to code this information is
is the design matrix. The vectors , are indicator vectors, they and are vectors in 6-dimensional space.
A model that relates the mean value of the response variable linearly to these three variables is
which when written out for each unit are:
This specification of the linear combination is sensible, but not unique. For example, if is the vector containing 6 ones, then we may transform to
and the coefficient of now measures the difference between the two treatment effects.