Home page for accesible maths

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

6.4.1 Examples

TheoremExample 6.4.5 Birth weights cont.

The response variable Yi is birth weight. There are two explanatory variables, gender (factor) and gestational age (continuous). Let xi,1 and xi,2 be indicator variables for males and females respectively and xi,3 be gestational age.

One possible model is

𝔼[Yi]=β1xi,1+β2xi,2+β3xi,3 (6.2)

This assumes a different intercept for males (β1) and females (β2), but a common slope for gestational age (β3). It does not include an overall intercept term - we will see later why this is.

A second possible model has a common intercept, but allows for separate slopes for males and females; this is an interaction between gender and age.

𝔼[Yi]=β1+β2xi,1xi,3+β3xi,2xi,3 (6.3)

What are the interpretations of β1, β2 and β3?

  • 1

    β1 is the expected birth weight of a baby born at 0 weeks gestation, regardless of gender;

  • 2

    β2 is the expected change in birth weight for a male with every extra week of gestation;

  • 3

    β3 is the expected change in birth weight for a female with every extra week of gestation.

The design matrix for model 6.2 has three columns; the first is the indicator for males, the second the indicator column for females and the third contains gestational age.

Describe the design matrix for model 6.3.

The design matrix for model 6.3 has three columns. The first is a column of 1’s for the intercept. The second is the product of the indicator variable for males and gestational age. The third is the product of the indicator variable for females and gestational age.

A third possible model, combining the first two, includes separate intercepts and separate slopes for the two genders.

𝔼[Yi]=β1xi,1+β2xi,2+β3xi,1xi,3+β4xi,2xi,3 (6.4)

A plot of all three model fits is shown in Figure 6.8, Figure 6.9 and Figure 6.10.

Fig. 6.8: Birthweight (grams) against gestational age (weeks), split by gender. Straight lines show fit of model 6.2.
Fig. 6.9: Birthweight (grams) against gestational age (weeks), split by gender. Straight lines show fit of model 6.3.
Fig. 6.10: Birthweight (grams) against gestational age (weeks), split by gender. Straight lines show fit of model 6.4.
Remark.

How can we choose which of models 6.2, 6.3 and 6.4 fits the data best? Intuitively model 6.3 seems sensible - all babies start at the same weight, but gender may affect the rate of growth. However, since our data only covers births from 35 weeks gestation onwards, we should only think about the model which best reflects growth during this period.

We will look at issues of model selection later.

TheoremExample 6.4.6 Gas consumption

Continuing example 6.4.2; the response Yi is gas consumption. Two explanatory variables, outside temperature (continuous) and before/after cavity wall insulation (factor).

Let xi,1 be outside temperature and xi,2 be an indicator variable for after cavity wall insulation, i.e.

xi,2 ={1if observationiis after insulation0if observationiis before insulation

The modelling approach is as follows. Example 6.2.2 gave a regression on outside temperature only,

𝔼[Yi]=β1+β2xi,1. (6.5)

Figure LABEL:gas_scatter2gas_scatter3 suggests that the rate of change of gas consumption with outside temperature was altered following insulation. There is also evidence of a difference in intercepts before and after insulation. We could include this information in the model as follows,

𝔼[Yi]=β1+β2xi,1+β3xi,2+β4xi,1xi,2. (6.6)

What are the interpretations of β1 and β3 in this model?

  • 1

    β1 is the expected gas consumption when the outside temperature is 0C, before insulation;

  • 2

    β2 is the change in gas consumption for a 1C change in outside temperature, before insulation.

To interpret β3 and β4:

  • 1

    β1+β3 is the expected gas consumption insulation when the outside temperature is 0C, after insulation;

  • 2

    β2+β4 is the change in gas consumption for a 1C change in outside temperature, after insulation.

β3 tells us about the change in intercept following insulation; β4 tells us how the relationship between gas consumption and outside temperature is altered following insulation.

Examples 6.4.5 and 6.4.6 show two different ways of including factors in linear models. In Example 6.4.5, indicator variables for all factors are included, but there is no intercept. In Example 6.4.6, there is an intercept term, but the indicator variable for only one of the two levels of the factor is included.

In general, we include an intercept term and indicator variables for p-1 levels of a p-level factor. This ensures that the columns of the design matrix X are linearly independent - even if we include two or more factors in the model.

For interpretation, one level of the factor is set as a ‘baseline’ (in our example this was before insulation) and the regression coefficients for the remaining levels of the factor can be used to report the additional effect of the remaining levels on top of the baseline.