6 Model Choice Nested Models and Likelihood Ratio Test Relationship between LRT and AIC

Information Criteria

The likelihood ratio test can only be applied in cases where two models are being compared, and one of those models is nested inside the other. We may wish to compare a wide range of models quickly, and allow for arbitrary dependence between these models.

We cannot use the likelihood directly, since the more parameters in the likelihood function, the better the model will fit (as discussed at the beginning of this chapter).

Information criteria give scores that indicate how well the chosen model fits, with an appropriate correction made for the complexity of the model.

The AIC (Akaike’s Information Criterion) is an information criterion based on the idea that a ‘good’ model is one that has good predictive properties. In other words, suppose I have data $\vec{x}$ that I use to estimate a model $f$ with MLE $\hat{\theta}$ . I believe my model to be ‘good’ if when new data $\vec{x}^{+}$ comes along, then my model $f$ with MLE $\hat{\theta}$ will fit the new data well. This is a neat idea as it sidesteps the overfitting problems we discussed earlier.

The AIC scores a model as

\mbox{AIC}=2\{-\ell(\hat{\theta})+p\}

where $p$ is the number of parameters in $\theta$ . If we are comparing a collection of models we should prefer the one with the smallest AIC.

IMPORTANT: When we use AIC we are comparing models of completely different forms based on the absolute value of their log-likelihoods. When writing down the likelihoods, we are therefore not allowed to delete constants with respect to $\theta$ as we usually do.

Remark: The derivation of the AIC is a little tricky and we omit it here. See Section 13.5-6 of Pawitan (2001) if interested.

A corollary to the AIC is that we are allowed to compare likelihoods for models with the same number of parameters. For example, log-normal, Weibull and gamma are all models with two parameters for non-negative data.

Example 6.3: Motion Sickness

Experiments were performed as part of a research programme investigating motion sickness at sea. Human subjects were placed in a cabin mounted to a hydraulic piston and subjected to vertical motion for two hours. The length of time until each subject first vomited was recorded. Of the 28 subjects in the study, 14 vomited within the two hours. The times in minutes were

5 11 11 13 24 63 65 69 69 79 82 82 102 115

What is a suitable model for these data?

As always a first step is to plot the data.

sick<-c(5,11,11,13,24,63,65,69,69,79,82,82,102,115)
stripchart(sick,pch=4,xlab=’vomit times (mins)’,method=’jitter’)

This leads to the strip chart given in Figure 6.2 (Link). Here the jitter is useful because we have a number of tied times.

Figure 6.2: Link, Caption: Strip chart of motion sickness experiment data

These data look interesting: for example, are there two ‘groups’ of times at which vomiting occurred?

We will now postulate some models for these data. The most simple distribution for strictly positive data is the exponential, so we will start there. Other possibilities are gamma and Weibull. We have already calculated log-likelihood functions for exponential and gamma distributions; we wrote functions for them in R in Example 6.6 that we can recycle.

We will calculate the log-likelihood function for the Weibull distribution then use R to find the MLEs and evaluate the log likelihoods using the mle function in the stats4 package.

The Weibull pdf is

f(x;\lambda,k)=\frac{k}{\lambda}\left(\frac{x}{\lambda}\right)^{k-1}e^{-(x/% \lambda)^{k}}

for $x\geq 0$ ; $\lambda>0$ ; $k>0$ , where $\lambda$ is the scale parameter and $k$ is the shape parameter.

So for data $x_{1},\ldots,x_{n}$ independent Weibull $(\lambda,k)$ we have

\ell(\lambda,k)=\sum_{i=1}^{n}\log\left[\frac{kx_{i}^{k-1}}{\lambda^{k}}\exp% \left\{-(x_{i}/\lambda)^{k}\right\}\right].

We could try and simplify further but we might as well just put this into R as it is. So our three functions are:

weiloglhd<-function(lambda,k,x){
 return(sum(log(k*x^(k-1)/lambda^k * exp(-(x/lambda)^k))))
}

exploglhd<-function(theta,x){
    n<-length(x)
    return(n*log(theta) - theta*n*mean(x))
}

gamloglhd<-function(alpha,beta,x){
    n<-length(x)
    return(n*alpha*log(beta) + (alpha-1)*sum(log(x))
      - beta*sum(x) - n*log(gamma(alpha)))
}

We now evaluate each of them for our motion sickness data. The mle function takes the negative log likelihood and requires the data to be specified already, so we write ‘helper functions’ first:

nel<-function(theta){
    return(-exploglhd(theta,x=sick))
}

ngl<-function(alpha,beta){
 return(-gamloglhd(alpha,beta,x=sick))
}

nwl<-function(lambda,k){
 return(-weiloglhd(lambda,k,x=sick))
}

Now we can maximise each of these using the mle function. Note we need to specify starting values for the parameters; in the simple examples we consider, starting each parameter at 1, say, should work.

sickexp<-mle(nel,start=list(theta=1))
sickgam<-mle(ngl,start=list(alpha=1,beta=1))
sickwei<-mle(nwl,start=list(lambda=1,k=1))
AIC(sickexp); AIC(sickgam); AIC(sickwei)

We prefer the model with the minimum AIC, therefore in this case we prefer the Weibull model.

We then check the quality of the model fit, using, for example, a QQ plot.

n<-length(sick)
y<-seq(from=1/(n+1),to=n/(n+1),length=n)
z<-qweibull(y,scale=61.7466,shape=1.4565)

qqplot(z,sick,xlab="Theoretical under Weibull",ylab="Actual")
abline(a=0,b=1)

The figure produced is Figure LABEL:fig:qq-sick. We see that a Weibull fit seems reasonable, but has not captured the potential bimodality in the data.

Figure 6.3: Link, Caption: QQ plot of Weibull fit to motion sickness experiment data

In reality what we may try next is a mixture model to cope with the bimodality. See Masters level statistics courses. We can also think about the assumptions — are the data independent?