Home page for accesible maths 13 Information and Sufficiency

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

13.2 Suppression of Information

Last time we introduced the score function (the derivative of the log-likelihood), and the observed information function (MINUS the second derivative of the log-likelihood). The score function is zero at the MLE. The observed information function evaluated at the MLE gives us a method to construct confidence intervals.

We will now study the concept of observed information in more detail.

TheoremExample 13.2.1 Human Genotyping

Humans are a diploid species, which means you have two copies of every gene (one from your father, one from your mother). Genes occur in different forms; this is what leads to some of the different traits you see in humans (e.g. eye colour). Mendelian traits are a special kind of trait that are determined by a single gene.

Having wet or dry earwax is a Mendelian trait. Earwax wetness is controlled by the gene ABCC11 (this gene lives about half way along chromosome 16). We will call the wet earwax version of ABCC11 W, and the dry version w. The wet version is dominant, which means you only need one copy of W to have wet earwax. Both copies of the gene need to be w to get dry earwax.

The Hardy-Weinberg law of genetics states that if W occurs in a (randomly mating) population with proportion p (so w occurs with proportion (1-p)) potential combinations in humans obey the proportions:

combination WW Ww ww
proportion p2 2p(1-p) (1-p)2

Suppose I take a sample of 100 people and assess the wetness of their earwax. I observe that 87 of the people have wet earwax and 13 of them have dry earwax.

I am actually interested in p, the proportion of copies of W in my population.

Show that the probability of a person having wet earwax is p(2-p), and that the probability of a person having dry earwax is (1-p)2. Also show that these two probabilities sum to 1.

The number of people with wet earwax in my sample is therefore Binomial(100,p(2-p)). So

Pr[obs|p]=(10087){p(2-p)}87{(1-p)2}13.

IMPORTANT FACT: when writing down the likelihood, we can always omit multiplicative constants, since they become additive in the log-likelihood, then disappear in the differentation. A multiplicative constant is one that does not depend on the parameter of interest (here p).

So we can write down the likelihood as

L(p) {p(2-p)}87{(1-p)2}13
={p(2-p)}87(1-p)26.

So the log likelihood is

l(p) =87log{p(2-p)}+26log(1-p)
=87log(p)+87log(2-p)+26log(1-p)

(plus constant).

Now p is a continuous parameter so a suitable way to find a candidate MLE is to differentiate. The score function is

S(p)=l(p)=87p-872-p-261-p.

We can solve S(p^)=0; it is as a quadratic in p^:

  1. 1

    87(2-p)(1-p)-87p(1-p)-26p(2-p)=0,

  2. 2

    200p^2-400p^+174=0,

  3. 3

    400±4002-4.200.1742.200=p^.

This gives two solutions but we need p^[0,1] as it is a proportion, so get p^=0.639 as our potential MLE.

The second derivative is

l′′(p)=-87p2-87(2-p)2-26(1-p)2.

This is clearly <0 at p^, confirming that it is a maximum.

The observed information is obtained by substituting p^ into -l′′(p), giving

IO(p^)=870.6392+87(2-0.639)2+26(1-0.639)2=459.5.

Hence an approximate 95% confidence interval for ptrue is given by

(l,u) =(p^-1.96IO(p^),p^+1.96IO(p^))
=(0.639-1.96459.5,0.639+1.96459.5)
=(0.548,0.730).

After all that derivation, don’t forget the context. This is a 95% confidence interval for the proportion of people with a W variant of ABCC11 gene in the population of interest.

Suppose that, instead of looking in people’s ears to see whether their wax is wet or dry we decide to genotype them instead, thereby knowing whether they are WW, Ww or ww.

This is a considerably more expensive option (although perhaps a little less disgusting) so a natural question is: what do we gain by doing this?

We take the same 100 people and find that 42 are WW, 45 are Ww and 13 are ww. Think about how this relates back to the earwax wetness. Did we need to genotype everyone?

The likelihood function for p given our new information is

L(p)  (p2)42{2p(1-p)}45{(1-p)2}13
=p84{2p(1-p)}45(1-p)26.

The log-likelihood is

l(p) =84log(p)+45log{2p(1-p)}+26log(1-p)
=84log(p)+45log(2)+45log(p)+45log(1-p)+26log(1-p)
=129log(p)+71log(1-p)+c.

where c is a constant.

As before, p is continuous so we can find candidates for the MLE by differentiating:

S(p)=l(p)=129p-711-p.

Now solving S(p^)=0 gives a candidate MLE

  1. 1

    129p^=711-p^,

  2. 2

    129(1-p^)=71p^,

i.e.

p^=129200=0.645.

This is our potential MLE. Checking the second derivative

l′′(p)=-129p2-71(1-p)2,

which is <0 at p^ confirming that it is a maximum.

The observed information is obtained by substituting p^ into -l′′(p), giving

IO(p^)=1290.6452+71(1-0.645)2=873.5.

Hence an approximate 95% confidence interval for ptrue is given by

(l,u) =(p^-1.96IO(p^),p^+1.96IO(p^))
=(0.645-1.96873.5,0.645+1.96873.5)
=(0.579,0.711).

Now, compare the confidence intervals and the observed informations from the two separate calculations. What do you conclude?

Of course, genotyping the participants of the study is expensive, so may not be worthwhile. If this was a real problem, the statistician could communicate the figures above to the geneticist investigating gene ABCC11, who would then be able to make an evidence-based decision about how to conduct the experiment.