Home page for accesible maths

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

10.1 The F test

The F-test gives a formal statistical test to choose between two nested models. It is based on a comparison between the sum of squares for each of the two models.

Suppose that model 1 has p1 explanatory variables, model 2 has p2>p1 explanatory variables and model 1 is nested in model 2. Let model 1 have design matrix X and parameters β; model 2 has design matrix A and parameters γ.

First we show formally that adding additional explanatory variables will always improve model fit, by decreasing the residual sum of squares for the fitted model.

If SS1=(y-Xβ^)(y-Xβ^) and SS2=(y-Aγ^)(y-Aγ^) are the residual sums of squares for models 1 and 2 respectively. Then

SS2SS1

Why does this last inequality hold?

Because of the nesting, we can always find a value γ~ such that

Xβ^=Aγ~.

Recalling the definition of the sum of squares,

SS2 =(y-Aγ^)(y-Aγ^)
(y-Aγ~)(y-Aγ~)

by definition of LSE. So by definition of γ~,

SS2=(y-Xβ^)(y-Xβ^)=SS1.

To carry out the F-test we must decide whether the difference between SS1 and SS2 is sufficiently large to merit the inclusion of the additional explanatory variables in model 2.

Consider the following hypothesis test

H0:Model 1 is the best fit

vs.

H1:Model 2 is the best fit.
Remark.

We do not say that ‘Model 1 is the true model’ or ‘Model 2 is the true model’. All models, be they probabilistic or deterministic, are a simplification of real life. No model can exactly describe a real life process. But some models can describe the truth ‘better’ than others. George Box (1919-), British statistician: ‘essentially, all models are wrong, but some are useful’.

To test H0 against H1, first calculate the test statistic

{mdframed}
F=(SS1-SS2)/(p2-p1)SS2/(n-p2). (10.1)

Now compare the test statistic to the Fp2-p1,n-p2 distribution, and reject H0 if the test statistic exceeds the critical value (equivalently if the p-value is too small).

The critical value from the Fp2-p1,n-p2 distribution can either be evaluated in R, or obtained from statistical tables.

TheoremExample 10.1.1 Brain weights cont.

We proposed three models for log (brain weight) with the following explanatory variables:

  1. 1

    log(body weight)

  2. 2

    hours sleep per day

  3. 3

    log(body weight)+hours sleep per day

Which of these models can we use the F-test to decide between?

The F-test does not allow us to choose between models L1 and L2, since these are not nested. However, it does give us a way to choose between either the pair L1 and L3, or the pair L2 and L3.

To choose between L1 and L2, we use a more ad hoc approach by looking to see which of the explanatory variables is ‘more significant’ than the other when we test

H0:β2=0

vs.

H1:β20.

Using summary(L1) and summary(L2), we see that the p-value for β2 in L1 is <2e-16 and for β2 in L2 is 4.30e-06. As we saw earlier, both of these indicate highly significant relationships between the response and the explanatory variable in question.

Which of the single covariate models is preferable?

Since the p-value for log(body weight) in model L1 is lower, our preferred single covariate model is L1.

We can now use the F-test to choose between our preferred single covariate model L1 and the two covariate model L3,

H0:L1 is the best fit

vs.

H1:L3 is the best fit.

We first find the sum of squares for both models. For L1, using the definition of the least squares,

SS(L1)=i=158ϵ^i2=i=158(yi-β^1-β^2xi,1)2=i=158(yi-2.15-0.759xi,1)2.

To calculate this in R,

> sum(L1$residuals^2)
[1] 28.00023

So SS(L1)=28.0.

For L3,

SS(L3) =i=158ϵ^i2=i=158(yi-β^1-β^2xi,1-β^3xi,2)2
=i=158(yi-2.60-0.728xi,1-(-0.0386)xi,2)2.

To calculate this in R,

> sum(L3$residuals^2)
[1] 26.70658

So SS(L3)=26.7.

Next, we find the degrees of freedom for the two models. Since n=58,

  • 1

    L1 has p1=2 regression coefficients, so the degrees of freedom are n-p1=58-2=56.

  • 2

    L3 has p2=3 regression coefficients, so the degrees of freedom are n-p2=58-3=55.

Finally we calculate the F-statistic given in equations (10.1),

F =[SS(L1)-SS(L3)]/(p2-p1)SS(L3)/(n-p2)
=(28.0-26.7)/(3-2)26.7/(58-3)
=1.29/0.486
=2.67.

The test statistic F=2.67 is then compared to the F distribution with (p2-p1,n-p2)=(1,55) degrees of freedom. From tables, the critical value is just above 4.00; from R it is 4.02.

What can we conclude from this?

Since 2.67<4.00, we conclude that there is no evidence to reject H0. There is no evidence to choose the more complicated model and so the best fitting model is L1.

Remark.

We should not be too surprised by this result, since we have already seen that the coefficient for total sleep time is not significantly different to zero in model L3. Once we have accounted for body weight, there is no extra information in total sleep time to explain any remaining variability in brainweights.