In the last lecture we showed that the MLE is asymptotically normally distributed, and we use this fact to construct an approximate 95% confidence interval.
In this lecture we will introduce the concept of deviance, and show that this leads to another way to calculate approximate confidence intervals that have various advantages.
We will begin by showing through an example where things can go wrong with the confidence intervals we know (and love?).
On a fair (European) roulette wheel there is a probability of each number coming up.
In the early 1990s, Gonzalo Garcia-Pelayo believed that casino roulette wheels were not perfectly random, and that by recording the results and analysing them with a computer, he could gain an edge on the house by predicting that certain numbers were more likely to occur next than the odds offered by the house suggested. This he did at the Casino de Madrid in Madrid, Spain, winning 600,000 euros in a single day, and one million euros in total.
Legal action against him by the casino was unsuccessful, it being ruled that the casino should fix its wheel:
Suppose I am curious that the number 17 seems to come up on a casino’s roulette wheel more frequently than other numbers. I track it for 30 spins, during which it comes up 2 times. I decide to carry out a likelihood analysis on , the probability of the number 17 coming up, and its confidence interval.
We propose to model the situation as follows. Let be the number of times the number 17 comes up in 30 spins of the roulette wheel. We decide to model .
Why is this a suitable model?
What assumptions are being made?
Are these assumptions reasonable?
The probability of the observed data is given by
The likelihood is simply the probability of the observed data, but we can ignore the multiplicative constants, so
The log-likelihood is
Differentiating,
Now remember solutions to are potential MLEs:
The second derivative will both tell us whether this is a maximum, and provide the observed information:
This is clearly negative for all , so must be a maximum.
Moreover, the observed information is
A 95% confidence interval for is given by
which, on substituting in and the observed information becomes
The resulting confidence interval includes negative values (for a probability parameter). What’s the problem??
Let’s look at a plot of the log-likelihood for the above situation.
We notice that the log-likelihood is quite asymmetric. This happens because the MLE is close to the edge of the feasible space (i.e. close to 0). The confidence interval defined above is forced to be symmetric, which seems inappropriate here.
Suppose we have a log-likelihood function with unknown parameter , . Then the deviance function is given by
Notice that , and .
What can we say about ?
This is a fixed (but unknown) value for fixed data . However, in similar spirit to the last lecture, we can consider random data . Now, the deviance function depends on (since different data leads to different likelihoods). So, is a random variable.
Suppose we have an iid sample from some distribution with unknown parameter . Then (under certain regularity conditions) in the limit as ,
i.e. the deviance of the true value of has a distribution with one degree of freedom.
The practical upshot of this result is that we have another way to construct a confidence interval for . A 95% confidence interval for , for example, is given by , i.e. any values of whose deviance is smaller than 3.84.
This property of the deviance is best seen visually. Going back to the roulette data:
From the graph we can estimate the confidence interval based on the deviance. In fact the exact answer to three decimal places is . Notice that this is not symmetrical, and that all values in the interval are feasible.
The original motivation for all of this was that we were wondering if the number 17 comes up more often than with the 1/37 that should be observed in a fair roulette wheel.
In fact 1/37=0.027, which is within the 95% confidence interval calculated above. Hence there is insufficient evidence (so far) to support the claim that this number is coming up more often than it should.