6 Linear predictor and model formula

6.2 Model formulae for continuous variables

The linear predictor is a linear combination of the explanatory variables that is defined by 𝜼=β1𝐱1+β2𝐱2++βp𝐱p. A more concise notation for the linear predictor suppresses reference to the coefficients of the combination, and write

𝜼span(𝐱1,𝐱2,,𝐱p).

The specification η=α+βx, or in vector form 𝜼=α𝟏+β𝐱, can be written as

𝜼span(𝟏,𝐱).

Now, it may be that the relationship between expected response 𝔼[Y] and explanatory variable x is more complicated than this. A reasonable procedure might be to see if enlarging the model to include a quadratic term improves the fit, i.e.

𝜼span(𝟏,𝐱,𝐱2)

where 𝐱2=𝐱.𝐱.

The notation for these subspaces can be streamlined by writing X=span(𝟏,x) and X2=span(𝟏,x2). (Notation care: X is often a random variable, or a design matrix, as well.) The quadratic model above can be written as follows by the sum of two subspaces:

𝜼X+X2.

In general if Xj=span(𝟏,𝐱j), then the model 𝜼=β0𝟏+β1𝐱1++βp𝐱p is equivalent to 𝜼X1+X2++Xp. The reason for requiring 𝟏Xj for each j=1,,p is concerned with indicator variables and will emerge later. This notation highlights the view of linear models as the specification of a subspace to which the linear predictor belongs.

Standard models

Model (model formula)
Simple linear regression X
Quadratic regression X+X2
Polynomial regression X+X2++Xk
Regression through the origin span(𝐱)
Multiple regression X1+X2
Multiple regression X1+X2++Xp.
Definition 6.2.1.

The degrees of freedom of model is df()=n-dim() where dim() is the minimum number of vectors required to span() and n is the number of observations.

 
Exercise 6.47
Consider the example of predicting timber volume from measured tree height and trunk radius. Define a linear predictor based on the volume of a cylinder.

If a tree is tall then it must have a wide trunk to support its height. Knowledge of one provides insight about the other, so the variables are likely to be correlated. How does this influence how to define the linear predictor?

 

Lattice diagrams provide a convenient format to summarise which models have been fitted. The diagram below gives all submodels for a linear predictor based on three variables.

Unnumbered Figure: Link

With an increasing number of variables these lattices rapidly become complex.