Home page for accesible maths

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

12.3 Likelihood Examples: continuous parameters

We now explore examples of likelihood inference for some common models.

TheoremExample 12.3.1 Accident and Emergency

Accident and emergency departments are hard to manage because patients arrive at random (they are unscheduled). Some patients may need to be seen urgently.

Excess staff (doctors and nurses) must be avoided because this wastes NHS money; however, A&E departments also have to adhere to performance targets (e.g. patients dealt with within four hours). So staff levels need to be ‘balanced’ so that there are sufficient staff to meet targets but not too many so that money is not wasted.

A first step in achieving this is to study data on patient arrival times. It is proposed that we model the time between patient arrivals as iid realisations from an Exponential distribution.

Why is this a suitable model?

What assumptions are being made?

Are these assumptions reasonable?

Suppose I stand outside Lancaster Royal Infirmary A&E and record the following inter-arrival times of patients (in minutes):

18.39, 2.70, 5.42, 0.99, 5.42, 31.97, 2.96, 5.28, 8.51, 10.90.

As usual, the first thing we do is look at the data!

arrive<-c(18.39,2.70,5.42,0.99,5.42,31.97,2.96,5.28,8.51,10.90)
stripchart(arrive,pch=4,xlab="inter-arrival time (mins)")

The exponential pdf is given by

f(x)=λexp(-λx),

for x0, λ0. Assuming that the data are iid, the definition of the likelihood function for λ gives us, for general data x1,,xn,

L(λ)=i=1nλexp(-λxi).

Note: we usually drop the ‘|x’ from L(λ|x) whenever possible.

Usually, when we have products and the parameter is continuous, the best way to find the MLE is to find the log-likelihood and differentiate.

So the log-likelihood is

l(λ) =i=1nlog{λexp(-λxi)}
=i=1n{log(λ)-λxi}
=nlog(λ)-λi=1nxi.

Now we differentiate:

λl(λ)=l(λ)=nλ-i=1nxi.

Now solutions to l(λ)=0 are potential MLEs.

  1. 1

    nλ^-i=1nxi=0,

  2. 2

    λ^=ni=1nxi=1x¯.

To ensure this is a maximum we check the second derivative is negative:

2λ2l(λ)=l′′(λ)=-nλ2<0.

So the solution we have found is the MLE, and plugging in our data we find (via 1/mean(arrive))

λ^=0.108.

Now that we have our MLE, we should check that the assumed model seems reasonable. Here, we will use a QQ-plot.

#MLE of lambda (rate)
lam<-1/mean(arrive)
#1/(n+1), 2/(n+1),..., n/(n+1).
quant<-seq(from=1/11,to=10/11,length=10)
#produce QQ-plot
qqplot(qexp(quant,rate=lam),arrive,xlab="Theoretical
quantiles",ylab="Actual")
#add line of equality
abline(0,1)

Given the small dataset, this seems ok – there is no obvious evidence of deviation from the exponential model.

Knowing that the exponential distribution is reasonable, and having an estimate for its rate, is useful to calculate staff scheduling requirements in the A&E.

Extensions of the idea consider flows of patients through the various services (take Math332 Stochastic Processes and/or the STOR-i MRes for more on this).

TheoremExample 12.3.2 Is human body temperature really 98.6 degrees Fahrenheit?

In an article by Mackowiak et al.33Mackowiak P.A., Wasserman, S.S. and Levine, M.M. (1992) A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body Temperature and Other Legacies of Carl Reinhold August Wunderlich, the authors measure the body temperatures of a number of individuals to assess whether true mean body temperature is 98.6 degrees Fahrenheit or not. A dataset of 130 individuals is available in the normtemp dataset. The data are assumed to be normally distributed with standard deviation 0.73.

Why is this a suitable model?

What assumptions are being made?

Are these assumptions reasonable?

What do the data look like?

The plot can be produced using

> load("normtemp.Rdata")
> hist(normtemp$temperature)

The histogram of the data is reasonable, but there might be some skew in the data (right tail).

The normal pdf is given by

f(x|μ,σ)=12πσ2exp{-(xi-μ)22σ2},

where in this case, σ is known.

The likelihood is then

L(μ|x1,,xn) =i=1n12πσ2exp{-(xi-μ)22σ2}
=(2πσ2)-n/2exp{-i=1n(xi-μ)22σ2}.

Since the parameter of interest (in this case μ) is continuous, we can differentiate the log-likelihood to find the MLE:

l(μ)=-n2log(2πσ2)-12σ2i=1n(xi-μ)2

and so

l(μ)=-12σ2i=1n(-2)(xi-μ).

For candidate MLEs we set this to zero and solve, i.e.

1σ2i=1nxi-nμ^ =0
i=1nxi-nμ^ =0

and so the MLE is μ^=x¯.

This is also the “obvious” estimate (the sample mean). To check it is indeed an MLE, the second derivative of the log-likelihood is

l′′(μ)=-n<0,

which confirms this is the case.

Using the data, we find μ^=x¯=98.25.

This might indicate evidence for the body temperature being different from the assumed 98.6 degrees Fahrenheit.

We now check the fit:

> temps <-normtemp$temperature   # for shorthand
> mean(temps)
[1] 98.24923
> stdtemp = (temps - mean(temps))/0.73
> qqnorm(stdtemp)
> abline(0,1)   # add "ideal" fit line y = x

The fit looks good – although (as the histogram previously showed) there is possibly some mild right (positive) skew, indicated by the quantile points above the y=x line.

Why might the QQ-plot show the “stepped” behaviour of the points?

TheoremExample 12.3.3

Every day I cycle to Lancaster University, and have to pass through the traffic lights at the crossroads by Booths (heading south down Scotforth Road). I am either stopped or not stopped by the traffic lights. Over a period of a term, I cycle in 50 times. Suppose that the time I arrive at the traffic lights is independent of the traffic light sequence.

On 36 of the 50 days, the lights are on green and I can cycle straight through. Let θ be the probability that the lights are on green. Write down the likelihood and log-likelihood of θ, and hence calculate its MLE.

With the usual iid assumption we see that, if R is the number of times the lights are on green then RBinomial(50,θ). So we have

Pr[R=36]=(5036)θ36(1-θ)14.

We therefore have, for general r and n,

L(θ)=(nr)θr(1-θ)n-r,

and

l(θ)=K+rlog(θ)+(n-r)log(1-θ).

Solutions to l(θ)=0 are potential MLEs:

l(θ)=rθ-n-r1-θ,

and if l(θ^)=0 we have

rθ^=  n-r1-θ^
i.e.  θ^=  rn.

For this to be an MLE it must have negative second derivative.

l′′(θ)=-rθ2-n-r(1-θ)2<0.

In particular we have r=36 and n=50 so θ^=36/50 is the MLE.

Now suppose that over a two week period, on the 14 occasions I get stopped by the traffic lights (they are on red) my waiting times are given by (in seconds)

4.2,6.9,13.7,2.8,19.3,10.4,1.0,19.4,18.6,0.6,4.5,12.9,0.5,16.0.

Assume that the traffic lights remain on red for a fixed amount of time tr, regardless of the traffic conditions.

Given the above data, write down the likelihood of tr, and sketch it. What is the MLE of tr?

We are going to assume that these waiting times are drawn independently from Uniform[0,tr], where tr is the parameter we wish to estimate.

Why is this a suitable model?

What assumptions are being made?

Are these assumptions reasonable?

Constructing the likelihood for this example is slightly different from those we have seen before. The pdf of the Uniform(0,tr) distribution is

f(x)={tr-10xtr0otherwise

The unusual thing here is that the data enters only through the boundary conditions on the pdf. Another way to write the above pdf is

f(x)=tr-1𝟏[0xtr],

where 𝟏 is the indicator function.

For data x1,,xn, the likelihood function is then

L(tr)=i=1ntr-1𝟏[0xitr].

We can write this as

L(tr) =tr-n𝟏[{0x1tr}{0xntr}]
=tr-n𝟏[max(xi)tr].

For our case we have n=14 and max(xi)=19.4, so L(tr)=tr-14𝟏[19.4tr] .

We are next asked to sketch this MLE. In R,

> maxx<-19.4
> #values of t_r to plot
> t<-seq(from=18,to=22,length=1000)
>
> #likelihood function
> uniflik<-function(t){
> t^(-14)*(t>=19.4)
> }
>
> plot(t,uniflik(t),type="l")

From the plot it is clear that t^r=19.4=max(xi), since this is the value that leads to the maximum likelihood. Notice that solving l(tr)=0 would not work in this case, since the likelihood is not differentiable at the MLE.

However, on the feasible range of tr, i.e. max(xi)tr, we have

l(tr)=-nlog(tr),

and so

l(tr)=-n/tr.

Remember that derivatives express the rate of change of a function. Hence since the (log-)likelihood is negative (tr>0 is strictly positive), then the likelihood is decreasing on the feasible range of parameter values.

Since we are trying to maximise the likelihood, this means we should take the minimum over the feasible range as the MLE. The minimum value on the range max(xi)tr is t^r=max(xi)=19.4.