The F-test gives a formal statistical test to choose between two nested models. It is based on a comparison between the sum of squares for each of the two models.
Suppose that model 1 has explanatory variables, model 2 has explanatory variables and model 1 is nested in model 2. Let model 1 have design matrix and parameters ; model 2 has design matrix and parameters .
First we show formally that adding additional explanatory variables will always improve model fit, by decreasing the residual sum of squares for the fitted model.
If and are the residual sums of squares for models 1 and 2 respectively. Then
Why does this last inequality hold?
Because of the nesting, we can always find a value such that
Recalling the definition of the sum of squares,
by definition of LSE. So by definition of ,
To carry out the F-test we must decide whether the difference between and is sufficiently large to merit the inclusion of the additional explanatory variables in model 2.
Consider the following hypothesis test
vs.
We do not say that ‘Model 1 is the true model’ or ‘Model 2 is the true model’. All models, be they probabilistic or deterministic, are a simplification of real life. No model can exactly describe a real life process. But some models can describe the truth ‘better’ than others. George Box (1919-), British statistician: ‘essentially, all models are wrong, but some are useful’.
To test against , first calculate the test statistic
(10.1) |
Now compare the test statistic to the distribution, and reject if the test statistic exceeds the critical value (equivalently if the -value is too small).
The critical value from the distribution can either be evaluated in R, or obtained from statistical tables.
We proposed three models for log (brain weight) with the following explanatory variables:
log(body weight)
hours sleep per day
log(body weight)+hours sleep per day
Which of these models can we use the -test to decide between?
The -test does not allow us to choose between models L1 and L2, since these are not nested. However, it does give us a way to choose between either the pair L1 and L3, or the pair L2 and L3.
To choose between L1 and L2, we use a more ad hoc approach by looking to see which of the explanatory variables is ‘more significant’ than the other when we test
vs.
Using summary(L1) and summary(L2), we see that the -value for in L1 is and for in L2 is . As we saw earlier, both of these indicate highly significant relationships between the response and the explanatory variable in question.
Which of the single covariate models is preferable?
Since the -value for log(body weight) in model L1 is lower, our preferred single covariate model is L1.
We can now use the -test to choose between our preferred single covariate model L1 and the two covariate model L3,
vs.
We first find the sum of squares for both models. For L1, using the definition of the least squares,
To calculate this in R,
So .
So .
Next, we find the degrees of freedom for the two models. Since ,
L1 has regression coefficients, so the degrees of freedom are .
L3 has regression coefficients, so the degrees of freedom are .
Finally we calculate the -statistic given in equations (10.1),
The test statistic is then compared to the distribution with degrees of freedom. From tables, the critical value is just above 4.00; from R it is 4.02.
What can we conclude from this?
Since , we conclude that there is no evidence to reject . There is no evidence to choose the more complicated model and so the best fitting model is L1.
We should not be too surprised by this result, since we have already seen that the coefficient for total sleep time is not significantly different to zero in model L3. Once we have accounted for body weight, there is no extra information in total sleep time to explain any remaining variability in brainweights.