Home page for accesible maths 7.2 Predicted values

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

7.2.1 Estimation of residual variance σ2

From the definition of the linear regression model there is one other parameter to be estimated: the residual variance σ2. We estimate this using the variance of the estimated residuals.

The estimated residuals are defined as,

ϵ^i=yi-μ^i=yi-β^1xi,1--β^pxi,p, (7.5)

and we estimate σ2 by

σ^2=1n-pi=1nϵ^i2.

The heuristic reason for dividing by n-p, rather than n, is that although the sum is over n residuals these are not independent since each is a function of the p parameter estimates β^1,,β^p. Dividing by n-p then gives an unbiased estimate of the residual variance. This is the same reason that we divide by n-1, rather than n, to get the sample variance. The square root of the residual variance, σ, is referred to as the residual standard error.

TheoremExample 7.2.2 Birth weights cont.

Returning to the simple linear regression on birth weight. To calculate the residuals we subtract the fitted birth weights from the observed birth weights. The birth weights are

  1. 1

    μ^1=-1485+116×40=3155,

  2. 2

    μ^2=-1485+116×38=2923,

  3. 3

What are the residuals?

The estimated residuals are

  1. 1

    ϵ^1=y1-μ^1=2968-3155=-187,

  2. 2

    ϵ^2=y2-μ^2=2795-2923=-128,

  3. 3

What is the estimate of the residual variance?

Since n=24 and p=2,

σ^2=1n-pi=1nϵ^i2=124-2[(-187)2+(-128)2+]=37455.09.

The estimated residuals can also be obtained from the lm fit in R,

> bwtlm$residuals

So we can calculate the residual variance as

> sum(bwtlm$residuals^2)/22

Why is this estimate slightly different to the one obtained previously?

We used rounded values of β^ to calculate the estimates. In fact, when we look at the residual standard error (σ^), the error made by using rounded estimates is much smaller.

Finally, lm also gives the residual standard error directly, via the summary function,

> summary(bwtlm)
Call:
lm(formula = bwt$Weight ~ bwt$Age)
Residuals:
Min      1Q  Median      3Q     Max
-262.03 -158.29    8.35   88.15  366.50
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1485.0      852.6  -1.742   0.0955 .
bwt$Age        115.5       22.1   5.228 3.04e-05 ***
---
Signif. codes:  0 â€˜***’ 0.001 â€˜**’ 0.01 â€˜*’ 0.05 â€˜.’ 0.1 â€˜ â€™ 1
Residual standard error: 192.6 on 22 degrees of freedom
Multiple R-squared: 0.554,      Adjusted R-squared: 0.5338
F-statistic: 27.33 on 1 and 22 DF,  p-value: 3.04e-05
Remark.

summary is a very useful command, for example it allows you to view the parameter estimates of a fitted model. We will use it more later in the course.