An engineer is interested in understanding the failure time of a particular engine component. He undertakes an experiment where 10 independent copies of a component are tested separately and their lifetime (in hours) measured, leading to the following data:
90 255 40 143 30 239 484 28 39 15
What conclusions can be made about the mean failure time?
Our first decision is what probability model to use for our data. This is an important part of the modelling process; formal model choice procedures will be described later in the module.
For now, our informal procedure, recalled from MATH235 is: plot the data first, and check the model fit after finding the MLE (e.g. using a QQ plot).
We first take a look at the data; for example we could use the R code
fail<-c(90,255,40,143,30,239,484,28,39,15) stripchart(fail,pch=4,xlab="failure times (in hours)")
This leads to the strip plot in Figure 2.1.
For lifetime data of this sort, for reasons that were
developed in the MATH230 course, a common model for such data (not
least because the failure times are bound to be non-negative) is the
exponential distribution, with rate parameter and mean
.
Thus, we will assume that our data (where , but we’ll work more generally for the moment) are independent realisations of independent random variables, , each having the distribution.
Assuming the exponential distribution, for each of the ,
Hence:
and
So
and for a stationary point . Hence,
which is verified as a maximum since
Thus, we obtain as the maximum likelihood estimator of : for our data this leads to .
The 95% confidence interval for based on the asymptotic normality of is given by
where and
So, replacing by the interval becomes
For comparison, we can now calculate an approximate 95% confidence interval based on the deviance.
The deviance is given by
From tables, for a 95% interval, we need to solve
. This is a non-linear equation in
; to solve it we plot and look for the
intersections with the line.
At this point, recall that inferences are required not about , but about the mean of the population, , i.e. . We could go right back to the start of the problem, replace with , and do the whole thing again working with instead of .
However, because of the invariance property of the likelihood that’s not
necessary: we get exactly the same results by simple
transformation of both the maximum likelihood
estimate and the deviance-based confidence interval.
In other words,
and the confidence interval becomes
For comparison, the confidence interval for the mean parameter based on the asymptotic normality of the MLE is .
Exercise 5: Verify that the confidence interval for based on asymptotic normality of the MLE is indeed .
xbar<-mean(fail) # function to calculate D(theta) dev<-function(theta){ 2*(10*log(1/xbar) - 10 - 10*log(theta) + theta*10*xbar) } # a sequence of x-values at which to evaluate D(theta) theta<-seq(from=0.001,to=0.02,length=1000) plot(theta,dev(theta),type="l",ylab="deviance",xlim=c(0,0.02)) # add horizontal line abline(h=3.84) # find exact solutions via... devroot<-function(theta){ dev(theta)-3.84 } # lower interval uniroot(devroot,c(0.002,0.005))$root # upper interval uniroot(devroot,c(0.010,0.015))$root