Home page for accesible maths 14 Distribution of the MLE

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

14.1 Recalling randomness

We have noted that an asymptotic 95% confidence interval for a true parameter, θ, is given by

(θ^-1.96IO(θ^),θ^+1.96IO(θ^)),

where θ^ is the MLE and

IO(θ|𝐱)=-l′′(θ|𝐱)=-2θ2l(θ|𝐱),

is the observed information.

In this lecture we will sketch the derivation of the distribution of the MLE, and show why the above really is an asymptotic 95% confidence interval for θ.

Recall the distinction between an estimate and an estimator.

Given a sample X1,,Xn, an estimator is any function W(X1,,Xn) of that sample. An estimate is a particular numerical value produced by the estimator for given data x1,,xn.

The maximum likelihood estimator is a random variable; therefore it has a distribution. A maximum likelihood estimate is just a number, based on fixed data.

For the rest of this lecture we consider an iid sample X1,,Xn, from some distribution with unknown parameter θ, and the MLE (maximum likelihood estimator) θ^(𝐗).

Definition.

The Fisher information of a random sample X1,,Xn is the expected value of minus the second derivative of the log-likelihood, evaluated at the true value of the parameter:

IE(θ)=𝔼[-2θ2l(θ|𝐗)].

This is related to, but different from, the observed information.

  • 1

    The observed information is calculated based on observed data; the Fisher information is calculated taking expectations over random data.

  • 2

    The observed information is calculated at θ^, the Fisher information is calculated at θtrue.

  • 3

    The observed information can be written down numerically; the Fisher information usually cannot be since it depends on θtrue, which is unknown.

TheoremExample 14.1.1 Fisher Information for a Poisson parameter

Suppose 𝐱 is a random sample from XPoisson(θtrue). Find the Fisher information. Remember that 𝔼[X]=θtrue. For θ>0,

L(θ)=f(𝐱|θ) =i=1ne-θθxixi!
=e-nθθixi×c

where c is a constant.

  1. 1

    logf(𝐱|θ)=-nθ+ixilogθ+c,

  2. 2

    θlogf(𝐱|θ)=ixiθ-n,

  3. 3

    2θ2logf(𝐱|θ)=-ixiθ2,

  4. 4

    2θ2logf(𝐗|θ)=-iXiθ2.

Hence

IE(θtrue) =𝔼(iXiθtrue2)
=nθtrueθtrue2=nθtrue.

We see that our answer is in terms of θtrue, which is unknown (and not in terms of the data!) The Fisher information is useful for many things in likelihood inference, to see more take MATH330 Likelihood Inference.

Here, it features in the most important theorem in the course.

Theorem (Asymptotic distribution of the maximum likelihood estimator).

Suppose we have an iid sample X=X1,,Xn from some distribution with unknown parameter θ, with maximum likelihood estimator θ^(X). Then (under certain regularity conditions) in the limit as n

θ^(𝐗)N(θ,IE-1(θ)).

This says that, for n large, the distribution of the MLE is approximately normal with mean equal to the true value of the parameter, and variance equal to the reciprocal of the Fisher information.

We will not prove the result in this course, but it has to do with the central limit theorem (from MATH230).

Turning this around, this means that, for large n,

Pr[θ(θ^(𝐗)-1.96IE-1(θ),θ^(𝐗)-1.96IE-1(θ))]0.95.

This result is useless as it stands, because we can only calculate IE(θ) when we know θ, and if we know it, why are we constructing a confidence interval for it?!

Luckily, the result also works asymptotically if we replace IE(θ) by IO(θ^), giving that

(θ^(𝐱)-1.96IO(θ^(𝐱)),θ^(𝐱)+1.96IO(θ^(𝐱)))

is an approximate 95% confidence interval for θ (as claimed earlier).