Home page for accesible maths 14 Distribution of the MLE 14 Distribution of the MLE Exam Question

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

14.1 Recalling randomness

We have noted that an asymptotic 95% confidence interval for a true parameter, $\theta$ , is given by

{\color[rgb]{1,1,1}\left(\hat{\theta}-\frac{1.96}{\sqrt{I_{O}(\hat{\theta})}},% \hat{\theta}+\frac{1.96}{\sqrt{I_{O}(\hat{\theta})}}\right),}

where $\hat{\theta}$ is the MLE and

{\color[rgb]{1,1,1}I_{O}(\theta|{\bf x})=-l^{\prime\prime}(\theta|{\bf x})=-% \frac{\partial^{2}}{\partial\theta^{2}}l(\theta|{\bf x}),}

is the observed information.

In this lecture we will sketch the derivation of the distribution of the MLE, and show why the above really is an asymptotic 95% confidence interval for $\theta$ .

Recall the distinction between an estimate and an estimator.

Given a sample $X_{1},\ldots,X_{n}$ , an estimator is any function $W(X_{1},\ldots,X_{n})$ of that sample. An estimate is a particular numerical value produced by the estimator for given data $x_{1},\ldots,x_{n}$ .

The maximum likelihood estimator is a random variable; therefore it has a distribution. A maximum likelihood estimate is just a number, based on fixed data.

For the rest of this lecture we consider an iid sample $X_{1},\ldots,X_{n}$ , from some distribution with unknown parameter $\theta$ , and the MLE (maximum likelihood estimator) $\hat{\theta}({\bf X})$ .

Definition.

The Fisher information of a random sample $X_{1},\ldots,X_{n}$ is the expected value of minus the second derivative of the log-likelihood, evaluated at the true value of the parameter:

I_{E}(\theta)=\mathbb{E}\left[-\frac{\partial^{2}}{\partial\theta^{2}}l(\theta% |{\bf X})\right].

This is related to, but different from, the observed information.

1

The observed information is calculated based on observed data; the Fisher information is calculated taking expectations over random data.
2

The observed information is calculated at $\hat{\theta}$ , the Fisher information is calculated at $\theta_{\text{true}}$ .
3

The observed information can be written down numerically; the Fisher information usually cannot be since it depends on $\theta_{\text{true}}$ , which is unknown.

TheoremExample 14.1.1 Fisher Information for a Poisson parameter

Suppose ${\bf x}$ is a random sample from $X\sim\operatorname{Poisson}(\theta_{\text{true}})$ . Find the Fisher information. Remember that $\mathbb{E}[X]=\theta_{\text{true}}$ . For $\theta>0$ ,

	$\displaystyle L(\theta)=f({\bf x}\|\theta)$	$\displaystyle=\prod_{i=1}^{n}\frac{e^{-\theta}\theta^{x_{i}}}{x_{i}!}$
		$\displaystyle=e^{-n\theta}\theta^{\sum_{i}x_{i}}\times c$

where $c$ is a constant.

1

$\log f({\bf x}|\theta)={\color[rgb]{1,1,1}-n\theta+\sum_{i}x_{i}\log\theta+c},$
2

$\frac{\partial}{\partial\theta}\log f({\bf x}|\theta)={\color[rgb]{1,1,1}\frac% {\sum_{i}x_{i}}{\theta}-n,}$
3

$\frac{\partial^{2}}{\partial\theta^{2}}\log f({\bf x}|\theta)={\color[rgb]{% 1,1,1}\frac{-\sum_{i}x_{i}}{\theta^{2}},}$
4

$\frac{\partial^{2}}{\partial\theta^{2}}\log f({\bf X}|\theta)={\color[rgb]{% 1,1,1}\frac{-\sum_{i}X_{i}}{\theta^{2}}.}$

Hence

	$\displaystyle I_{E}(\theta_{\text{true}})$	$\displaystyle=\mathbb{E}\left(\frac{\sum_{i}X_{i}}{\theta_{\text{true}}^{2}}\right)$
		$\displaystyle=\frac{n\theta_{\text{true}}}{\theta_{\text{true}}^{2}}=\frac{n}{% \theta_{\text{true}}}.$

We see that our answer is in terms of $\theta_{\text{true}}$ , which is unknown (and not in terms of the data!) The Fisher information is useful for many things in likelihood inference, to see more take MATH330 Likelihood Inference.

Here, it features in the most important theorem in the course.

Theorem (Asymptotic distribution of the maximum likelihood estimator).

Suppose we have an iid sample ${\bf X}=X_{1},\ldots,X_{n}$ from some distribution with unknown parameter $\theta$ , with maximum likelihood estimator $\hat{\theta}({\bf X})$ . Then (under certain regularity conditions) in the limit as $n\rightarrow\infty$

\hat{\theta}({\bf X})\sim\operatorname{N}\left(\theta,I_{E}^{-1}(\theta)\right).

This says that, for $n$ large, the distribution of the MLE is approximately normal with mean equal to the true value of the parameter, and variance equal to the reciprocal of the Fisher information.

We will not prove the result in this course, but it has to do with the central limit theorem (from MATH230).

Turning this around, this means that, for large $n$ ,

\Pr\left[\theta\in\left(\hat{\theta}({\bf X})-1.96\sqrt{I_{E}^{-1}(\theta)},% \hat{\theta}({\bf X})-1.96\sqrt{I_{E}^{-1}(\theta)}\right)\right]\approx 0.95.

This result is useless as it stands, because we can only calculate $I_{E}(\theta)$ when we know $\theta$ , and if we know it, why are we constructing a confidence interval for it?!

Luckily, the result also works asymptotically if we replace $I_{E}(\theta)$ by $I_{O}(\hat{\theta})$ , giving that

\left(\hat{\theta}({\bf x})-\frac{1.96}{\sqrt{I_{O}(\hat{\theta}({\bf x}))}},% \hat{\theta}({\bf x})+\frac{1.96}{\sqrt{I_{O}(\hat{\theta}({\bf x}))}}\right)

is an approximate 95% confidence interval for $\theta$ (as claimed earlier).