6 Model Choice

Nested Models and Likelihood Ratio Test

Suppose two models, f0(,θ0) and f1(,θ1) are under consideration. Then we say f0 is nested in f1 if the parameter space of f0 is a subspace of the parameter space of f1. Often, this can be demonstrated by showing that setting parameters in f1 to particular values results in a model of the form f0.

Nested model examples

  • The exponential model is nested within a Weibull model. The Weibull pdf is

    f(x;λ,k)=kλ(xλ)k-1e-(x/λ)k

    for x0, λ>0, k>0, where λ is the scale parameter and k is the shape parameter. However, if we set k=1 we recover the exponential pdf.

  • The exponential model is nested within a gamma model. The gamma pdf is

    f(x;α,β)=βαΓ(α)xα-1e-βx,

    for x0,α>0,β>0, where β is the rate paramter and α is the shape parameter. Now, if we set α=1 then we recover the exponential pdf (albeit with a slightly different parameterisation: let λ=β-1).

  • The Uniform[0,1] distribution is nested within the beta distribution. The beta pdf is

    f(x;α,β)=xα-1(1-x)β-1B(α,β)

    for 0x1, α>0, β>0, where α and β are both shape parameters. If we set α=β=1 this is the Uniform[0,1] distribution (NB B(1,1)=1).

  • The negative binomial distribution gives the number of failures x observed until r successes are observed. The geometric distribution is nested within the negative binomial. The negative binomial pmf is

    p(x;r,p)=(x+r-1x)(1-p)rpx.

    Setting r=1 yields the geometric distribution.

  • The normal distribution with known variance is nested within a normal distribution with unknown variance.

If we are comparing two nested models for data x we can use the likelihood ratio test. The set-up is to view the choice between the two models as a hypothesis test. We take f0 as a nested model within f1 and take Ωi to be the parameter space of fi. Then Ω0Ω1.

Theorem 13: Likelihood Ratio Test for Model Comparison.

Suppose we have the two models f0 and f1 as described above. Our hypotheses are

H0: The simpler model adequately describes x, i.e. θ0Ω0,
H1: The more complex model is required, i.e. θ0{Ω1Ω0}.

Then the deviance (likelihood ratio test) for model comparison is

D(f1,f0)=2{1(θ^1)-0(θ^0)}.

Under the usual regularity conditions, if f1 has p1 unknown parameters and f0 has p0 unknown parameters (of course p1>p0) then under H0, asymptotically as n,

D(f1,f0)χp1-p02.

Remarks:

  • It should not surprise you to learn that this theorem is closely related to Theorems 2, 7 and 12. In fact, all three of these are corollaries to the above result. Convincing yourself of this would be a good way of enhancing your understanding.

  • It is always the case that 1(θ^1)>0(θ^0). To see this, remember that θ^0 is a valid parameter choice for model f1, and θ^1 must have at least as high a likelihood under f1 by virtue of it being the MLE.

Theorem 13 suggests that we should evaluate D(f1,f0) and compare to the corresponding critical value, zc2 from the χ2 distribution. If D(f1,f0)<zc2 we do not reject H0 and take the simpler model f0 as adequate.

Example 6.1:  Failure times, ctd.

We revisit the failure times example first encountered in Section 2.6. There we fitted an exponential model to the failure time data; should a gamma model be preferred?

From earlier,

(θ)=nlogθ-θnx¯;θ^=x¯-1.

Using R we evaluate this at the MLE:

fail<-c(90,255,40,143,30,239,484,28,39,15)
exploglhd<-function(theta,x){
 n<-length(x)
 return(n*log(theta) - theta*n*mean(x))
}

exploglhd(1/mean(fail),fail)

This gives (θ^)=-59.14858, with θ^=0.00734.

Now we consider the likelihood under a gamma distribution. Recall from Example 3.7 that maximising the gamma distribution reduces to solving

nlogα^-nlogx¯+i=1nlogxi-nγ(α^),

and then

β^=α^x¯.

The first equation can only be solved numerically.

Using R,

f<-function(alpha,x){
   n<-length(x)
   return(n*log(alpha)-n*log(mean(x))+sum(log(x))-n*digamma(alpha))
}

uniroot(f,lower=0.001,upper=10,x=fail)

giving α^=1.013, and so β^=0.00743. (Note that the digamma function in R is the derivative of the log of the gamma function.)

Again, we evaluate the gamma log likelihood at its MLE in R.

gamloglhd<-function(alpha,beta,x){
    n<-length(x)
    return(n*alpha*log(beta) + (alpha-1)*sum(log(x))
Ψ- beta*sum(x) - n*log(gamma(alpha)))
}

gamloglhd(alpha=1.012568,beta=0.007428966,x=fail)

giving (α^,β^)=-59.14808, only very slightly larger than the exponential distribution. This gives a deviance of 2(-59.14808--59.14858)=0.001.

This is much less than the critical value of χ12 distribution, 3.84, hence we retain the simpler exponential model. We should not be surprised at our conclusion because α^=1.013, which is very close to 1, the value that would imply an exponential distribution.

Example 6.2:  Surgical Mortality Rates at Hospitals The below table gives mortality levels for cardiac surgery on babies at 12 hospitals. r/m means r deaths in m operations.

A  0/47 B  18/148 C  8/119 D  46/810 E  8/211 F  13/196
G  9/148 H  31/215 I  14/207 J  8/97 K  29/256 L  24/360

Let θs be the mortality rate for hospital s. We would like to know whether mortality rates vary between hospitals. This can be expressed as the hypothesis test

H0: θs=θ* for all s=A,,L.
H1: Each θs is allowed to be different.

Clearly the model suggested by H0 is nested within the model for H1.

Letting rs denote the number of deaths in hospital s and ms the total number of operations carried out at hospital s, the likelihood is

L(θ)=s=AL(msrs)θsrs(1-θs)ms-rs.

Under H1 this breaks down into separate Binomial distributions for each hospital, so we have

θ^s=rsms

for s=A,,L.

Under H0 this collapses into a single Binomial distribution where operations over all hospitals are added together, yielding

θ^*=rsms.

We use R to evaluate the likelihood under each scenario as follows:

####hospital mortality example
r<-c(0,18,8,46,8,13,9,31,14,8,29,24)
m<-c(47,148,119,810,211,196,148,215,207,97,256,360)

hosploglhd<-function(theta,r,m){
    return(log(prod(dbinom(r,m,theta))))
}

#h0:
hosploglhd(theta=sum(r)/sum(m),r,m)
#h1:
hosploglhd(theta=r/m,r,m)

This gives a log-likelihood of -44.15 for H0 and -24.88 for H1. The likelihood ratio statistic is then

2(-24.88--44.15)=38.54.

The model under H1 has 11 parameters extra to the model under H0; this gives a critical χ2 value of qchisq(0.95,df=11)=19.68. Since 38.54>19.68 we reject H0 and conclude that mortality rates vary between hospitals.

What is the interpretation of this?

  • Different case mix encountered by hospitals (e.g. specialised hospital taking more complex cases).

  • Some hospitals not as good as others.

Monitoring surgical mortality between hospitals (and between surgeons) is important to ensure that those performing below par are identified quickly.