Home page for accesible maths 7.1 Estimation of regression coefficients β

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

7.1.1 Examples

TheoremExample 7.1.1 Birth weights cont.

We return to the birth weight data in example 6.2.1. The full data set is given in Table 7.1. We will fit the simple linear regression for birth weight Yi with gestational age xi as explanatory variable,

𝔼[Yi]=β1+β2xi

The response vector and design matrix are

y=[296827953163292528753231]

and

X=[140138140135139140].

Obtain the least squares estimate β^.

To find β^ we use the formula

β^=(XX)-1Xy

From above,

  1. 1

    XX=[2492592535727],

  2. 2

    (XX)-1=[19.6-0.507-0.5070.0132],

  3. 3

    Xy=[712242753867].

Therefore,

β^=(XX)-1Xy=[-1485116].

The fitted model for birth weight, given gestational age at birth is,

𝔼[Yi]=-1485+116xi

We can interpret this as follows,

  • 1

    For every additional week of gestation, expected birth weight increases by 116 grams.

  • 2

    If a child was born at zero weeks of gestation, their birth weight would be -1485 grams.

Why does the second result not make sense?

Because the matrices involved can be quite large, whether due to a large sample size n, a large number p of explanatory variables, or both, it is useful to be able to calculate parameter estimates using computer software. In R, we can do this ‘by hand’ (treating R as a calculator), or we can make use of the function lm which will carry out the entire model fit. We illustrate both ways.

TheoremExample 7.1.2 Birth weight model in R

Load the data set bwt into R. To obtain the size of the data set,

> dim(bwt)
[1] 24  3

This tells us that there are 24 subjects and 3 variables. The variables are,

> names(bwt)
[1] "Age"    "Weight" "Gender"

To fit the simple linear regression of the previous example ‘by hand’,

  1. 1

    Set up the design matrix,

    > X <- matrix(cbind(rep(1,24),bwt$Age),ncol=2)
  2. 2

    Calculate β^ using equation (7.3),

    > beta <- solve(t(X)%*%X)%*%t(X)%*%bwt$Weight
  3. 3

    View results

    > beta
    [,1]
    [1,] -1484.9846
    [2,]   115.5283

To fit the same model using lm,

  1. 1

    Specify the required model. Note R assumes that you want to include an intercept term, so this need not be explicitly included,

    > bwtlm <- lm(bwt$Weight~bwt$Age)
  2. 2

    To view the estimates β^,

    > bwtlm$coefficient
    (Intercept)     bwt$Age
    -1484.9846    115.5283
Child Gestational Age (wks) Birth weight (grams) Gender
1 40 2968 M
2 38 2795 M
3 40 3163 M
4 35 2925 M
5 36 2625 M
6 37 2847 M
7 41 3292 M
8 40 3473 M
9 37 2628 M
10 38 3176 M
11 40 3421 M
12 38 2975 M
13 40 3317 F
14 36 2729 F
15 40 2935 F
16 38 2754 F
17 42 3210 F
18 39 2817 F
19 40 3126 F
20 37 2539 F
21 36 2412 F
22 38 2991 F
23 39 2875 F
24 40 3231 F
Table 7.1: Gestational age at birth (weeks), birth weight (grams) and gender of 24 individuals.
TheoremExample 7.1.3 Gas consumption cont.

Recall example 6.4.2 in which we investigated the relationship between gas consumption and external temperature. To measure the effect of changes in the external temperature on gas consumption, we fit the multiple linear regression model 6.6. We will allow a different relationship between gas consumption and outside temperature before and after the installation of cavity wall insulation. The model has four regression coefficients

𝔼[Yi]=β1+β2xi,1+β3xi,2+β4xi,1xi,2

Here xi,1 is outside temperature and xi,2 is an indicator variable taking the value 1 after installation.

The data are shown in Table 7.2.

To estimate the parameters by hand, we first set up the response vector and design matrix,

y=[7.26.96.42.64.83.53.4]

and

X=[1-0.8001-0.70010.400110.2001-0.71-0.714.714.714.914.9].

Since XX will be a 4×4 matrix, it is easier to do our calculations in R. First load the data set gas.

> names(gas)
[1] "Insulate"  "Temp"      "Gas"       "Insulate2"
  • 1

    Insulate contains Before or After to indicate whether or not cavity wall insulation has taken place;

  • 2

    Temp contains outside temperature;

  • 3

    Gas contains gas consumption;

  • 4

    Insulate2 contains a 0 or 1 to indicate before (0) or after (1) cavity wall insulation.

To set up the design matrix

> X <- matrix(cbind(rep(1,44),gas$Temp,gas$Insulate2,
gas$Insulate2*gas$Temp),ncol=4)

Then to obtain β^,

> beta <- solve(t(X)%*%X)%*%(t(X)%*%gas$Gas)
> beta
[,1]
[1,]  6.8538277
[2,] -0.3932388
[3,] -2.2632102
[4,]  0.143612

Thus the fitted model is

𝔼[Yi]=6.85-0.393xi,1-2.26xi,2+0.144xi,1xi,2
  • 1

    Before cavity wall insulation, when the outside temperature is 0C, the expected gas consumption is 6.85 1000’s cubic feet.

  • 2

    Before cavity wall insulation, for every increase in temperature of 1C, the expected gas consumption decreases by 0.393 1000’s cubic feet.

  • 3

    After cavity wall insulation, for every increase in temperature of 1C, the expected gas consumption decreases by 0.249 1000’s cubic feet.

Where does the figure 0.249 come from?

Substitute xi,2=1 into the fitted model; -0.393+0.144 is the overall rate of change of gas consumption with temperature.

What is the expected gas consumption after cavity wall insulation, when the outside temperature is 0C?

6.85-2.26=4.59 thousand cubic feet.

We can alternatively fit this model in R using lm,

> gaslm <- lm(gas$Gas~gas$Temp*gas$Insulate2)
> coefficients(gaslm)
(Intercept)               gas$Temp
6.8538277             -0.3932388
gas$Insulate2             gas$Temp:gas$Insulate2
-2.2632102              0.143612
Remark.

We have used an interaction term * between two explanatory variables. Then R includes an intercept, a term for each of the explanatory variables and the interaction between the two explanatory variables. We will look at interactions in more detail later.

Remark.

The model suggests that cavity wall insulation decreases gas consumption when the outside temperature is 0C. Further, the rate of increase in gas consumption as the outside temperature decreases is less when the cavity wall is insulated. Are these differences significant?

Observation Insulation Outside Temp. (C) Gas consumption
1 Before -0.8 7.2
2 Before -0.7 6.9
3 Before 0.4 6.4
4 Before 2.5 6.0
5 Before 2.9 5.8
6 Before 3.2 5.8
7 Before 3.6 5.6
8 Before 3.9 4.7
9 Before 4.2 5.8
10 Before 4.3 5.2
11 Before 5.4 4.9
12 Before 6.0 4.9
13 Before 6.0 4.3
14 Before 6.0 4.4
15 Before 6.2 4.5
16 Before 6.3 4.6
17 Before 6.9 3.7
18 Before 7.0 3.9
19 Before 7.4 4.2
20 Before 7.5 4.0
21 Before 7.5 3.9
22 Before 7.6 3.5
23 Before 8.0 4.0
24 Before 8.5 3.6
25 Before 9.1 3.1
26 Before 10.2 2.6
27 After -0.7 4.8
28 After 0.8 4.6
29 After 1.0 4.7
30 After 1.4 4.0
31 After 1.5 4.2
32 After 1.6 4.2
33 After 2.3 4.1
34 After 2.5 4.0
35 After 2.5 3.5
36 After 3.1 3.2
37 After 3.9 3.9
38 After 4.0 3.5
39 After 4.0 3.7
40 After 4.2 3.5
41 After 4.3 3.5
42 After 4.6 3.7
43 After 4.7 3.5
44 After 4.9 3.4
Table 7.2: Outside temperature (C), gas consumption (1000’s cubic feet) and whether or not cavity wall insulation has been installed.