Suppose two models, and are under consideration. Then we say is nested in if the parameter space of is a subspace of the parameter space of . Often, this can be demonstrated by showing that setting parameters in to particular values results in a model of the form .
Nested model examples
The exponential model is nested within a Weibull model. The Weibull pdf is
for , , , where is the scale parameter and is the shape parameter. However, if we set we recover the exponential pdf.
The exponential model is nested within a gamma model. The gamma pdf is
for ,,, where is the rate paramter and is the shape parameter. Now, if we set then we recover the exponential pdf (albeit with a slightly different parameterisation: let ).
The Uniform[0,1] distribution is nested within the beta distribution. The beta pdf is
for , , , where and are both shape parameters. If we set this is the Uniform[0,1] distribution (NB ).
The negative binomial distribution gives the number of failures observed until successes are observed. The geometric distribution is nested within the negative binomial. The negative binomial pmf is
Setting yields the geometric distribution.
The normal distribution with known variance is nested within a normal distribution with unknown variance.
If we are comparing two nested models for data we can use the likelihood ratio test. The set-up is to view the choice between the two models as a hypothesis test. We take as a nested model within and take to be the parameter space of . Then .
Theorem 13: Likelihood Ratio Test for Model Comparison.
Suppose we have the two models and as described above. Our hypotheses are
: The simpler model adequately describes , i.e. ,
: The more complex model is required, i.e. .
Then the deviance (likelihood ratio test) for model comparison is
Under the usual regularity conditions, if has unknown parameters and has unknown parameters (of course ) then under , asymptotically as ,
Remarks:
It should not surprise you to learn that this theorem is closely related to Theorems 2, 7 and 12. In fact, all three of these are corollaries to the above result. Convincing yourself of this would be a good way of enhancing your understanding.
It is always the case that . To see this, remember that is a valid parameter choice for model , and must have at least as high a likelihood under by virtue of it being the MLE.
Theorem 13 suggests that we should evaluate and compare to the corresponding critical value, from the distribution. If we do not reject and take the simpler model as adequate.
Example 6.1: Failure times, ctd.
We revisit the failure times example first encountered in Section 2.6. There we fitted an exponential model to the failure time data; should a gamma model be preferred?
From earlier,
Using R we evaluate this at the MLE:
fail<-c(90,255,40,143,30,239,484,28,39,15) exploglhd<-function(theta,x){ n<-length(x) return(n*log(theta) - theta*n*mean(x)) } exploglhd(1/mean(fail),fail)
This gives , with .
Now we consider the likelihood under a gamma distribution. Recall from Example 3.7 that maximising the gamma distribution reduces to solving
and then
The first equation can only be solved numerically.
Using R,
f<-function(alpha,x){ n<-length(x) return(n*log(alpha)-n*log(mean(x))+sum(log(x))-n*digamma(alpha)) } uniroot(f,lower=0.001,upper=10,x=fail)
giving , and so . (Note that the digamma function in R is the derivative of the log of the gamma function.)
Again, we evaluate the gamma log likelihood at its MLE in R.
gamloglhd<-function(alpha,beta,x){ n<-length(x) return(n*alpha*log(beta) + (alpha-1)*sum(log(x)) Ψ- beta*sum(x) - n*log(gamma(alpha))) } gamloglhd(alpha=1.012568,beta=0.007428966,x=fail)
giving , only very slightly larger than the exponential distribution. This gives a deviance of .
This is much less than the critical value of distribution, 3.84, hence we retain the simpler exponential model. We should not be surprised at our conclusion because , which is very close to 1, the value that would imply an exponential distribution.
Example 6.2: Surgical Mortality Rates at Hospitals The below table gives mortality levels for cardiac surgery on babies at 12 hospitals. means deaths in operations.
A 0/47 | B 18/148 | C 8/119 | D 46/810 | E 8/211 | F 13/196 |
G 9/148 | H 31/215 | I 14/207 | J 8/97 | K 29/256 | L 24/360 |
Let be the mortality rate for hospital . We would like to know whether mortality rates vary between hospitals. This can be expressed as the hypothesis test
: for all .
: Each is allowed to be different.
Clearly the model suggested by is nested within the model for .
Letting denote the number of deaths in hospital and the total number of operations carried out at hospital , the likelihood is
Under this breaks down into separate Binomial distributions for each hospital, so we have
for .
Under this collapses into a single Binomial distribution where operations over all hospitals are added together, yielding
We use R to evaluate the likelihood under each scenario as follows:
####hospital mortality example r<-c(0,18,8,46,8,13,9,31,14,8,29,24) m<-c(47,148,119,810,211,196,148,215,207,97,256,360) hosploglhd<-function(theta,r,m){ return(log(prod(dbinom(r,m,theta)))) } #h0: hosploglhd(theta=sum(r)/sum(m),r,m) #h1: hosploglhd(theta=r/m,r,m)
This gives a log-likelihood of -44.15 for and -24.88 for . The likelihood ratio statistic is then
The model under has 11 parameters extra to the model under ; this gives a critical value of qchisq(0.95,df=11)=19.68. Since we reject and conclude that mortality rates vary between hospitals.
What is the interpretation of this?
Different case mix encountered by hospitals (e.g. specialised hospital taking more complex cases).
Some hospitals not as good as others.
Monitoring surgical mortality between hospitals (and between surgeons) is important to ensure that those performing below par are identified quickly.