Outliers can have an unduly large influence on the model fit, but this is not necessarily the case. Conversely, some points which are not outliers may actually have a disproportionate influence on the model fit. One way to measure the influence of an observation on the overall model fit is to refit this model without the observation.
Cook’s distance summarises the difference between the parameter vector estimated using the full data set and the parameter vector obtained using all the data except observation .
The formula for calculating Cook’s distance for observation is
where is the studentized residual.
It is not straightforward to derive the sampling distribution for this test statistic. Instead it is common practice to follow the following guidelines.
First, look for observations with large , since if these observations are removed, the estimates of the model parameters will change considerably.
If is considerably less than 1 for all observations, none of the cases have an unduly large influence on the parameter estimates.
For every influential observation identified, the model should be refitted without this observation and the changes to the model noted.
We calculate Cook’s distance for the outlying observation (number 12). From the previous example , and . Therefore,
Since this is reasonably far from 1, we conclude that whilst observation 12 is an outlier, it does not appear to have an unduly large influence on the parameter estimates.