Home page for accesible maths 10.2 Link to one-way ANOVA 11 Diagnostics

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

10.3 Summary

{mdframed}

1

Covariate selection is the process of deciding which covariates explain a significant amount of the variability in the response.
2
Two nested linear regression models can be compared using an $F$ -test. Take two models
1. 1
  
  $Y=X\beta+\epsilon_{2}$ ,
2. 2
  
  $Y=A\gamma+\epsilon_{2}$ ,
where $X$ and $A$ have the same number of rows, but the number of columns in $X$ is less than that in $A$ . Provided $X$ and $A$ are of full rank, then the first model is nested in the second if $X$ is a subspace of $A$ .
3

A simple example of nesting is when the more complicated model contains all the explanatory variables in the first model, plus one or more additional ones.
4

Introducing extra explanatory variables will always reduce the residual sum of squares.
5

To compare two nested models with sums of squares $SS_{1}$ (simpler model) and $SS_{2}$ (more complicated model), calculate

$F=\frac{(SS_{1}-SS_{2})/(p_{2}-p_{1})}{SS_{2}/(n-p_{2})}$

and compare this to the $F_{p_{2}-p_{1},n-p_{2}},$ distribution.
6

The one-way ANOVA can be shown to be a special case of the multiple linear regression model, in which all the explanatory variables are taken to be indicator functions denoting membership of each of the $m$ groups.

Chapter 11 Diagnostics

Even if a valid statistical method, such as the $F$ -test, has been used to select our preferred linear model, checks should still be made to ensure that this model fits the data well. After all, it could be that all the models that we tried to fit actually described the data poorly, so that we have just made the best of a bad job. If this is the case, we need to go back and think about the underlying physical processes generating the data to suggest a better model.

Diagnostics refers to a set of tools which can be used to check how well the model describes (or fits) the data. We will use diagnostics to check that

1

The estimated residuals $\hat{\epsilon}_{i}$ follow a $\operatorname{Normal}(0,\sigma^{2})$ distribution;
2

The estimated residuals are independent of the covariates used in the model;
3

The estimated residuals are independent of the fitted values $\hat{\mu}_{i}$ ; None of the observations is an outlier (perhaps due to measurement error);
4

No observation has undue influence on the estimated model parameters.