We have discussed that has a distribution (called the sampling distribution).
Theorem 1 in this section establishes that the sampling distribution is asymptotically normal. We can also calculate the asymptotic (sampling) distribution of any function of , and one that is particularly useful is the deviance of the true parameter, , which Theorem 2 establishes has a distribution.
The asymptotic results stated below hold subject to some weak regularity conditions. These are:
is an interior point of with the support of not depend on .
No two different values of give the same for all .
The first three derivatives of exist in a neighbourhood of .
The first of these conditions ensures that the boundary of the sampling space does not depend on the unknown parameter (excluding the Uniform model, for example).
The second ensures that different values of give rise to different probability models.
The third ensures that the likelihood is sufficiently smooth
close to to enable Taylor series approximations to work
well.
Conditions R2 and R3 hold for all the models we will see in this course and we will assume they are true without checking for their validity. We will consider cases where R1 does not hold.
Theorem 1: Asymptotic distribution of the MLE.
Under the regularity conditions, in the limit as
,
(A sketch proof for the multi-parameter case is given later).
Note 1: Theorem 1 gives the asymptotic distribution of . It is usually interpreted (and remembered) as
for large sample sizes, , with Var and standard error, .
Note 2: As is increasing with this result
tells us that as and it gives how varies about for
large .
Note 3: Any term asymptotically (as ) equivalent to can be used in Theorem 1 instead, so as ,
Though the asymptotic results are unaffected by such changes, the speed of convergence to the asymptotic limits are different, so the accuracy of the approximations for finite values of will also be different.
Note 4: As is unknown we tend only to use and in practice. Thus:
The approximation to the asymptotic distribution using expected information we use is
and the approximation to the asymptotic distribution using observed information we use is
Theorem 2: Asymptotic distribution of the deviance.
Under the regularity conditions, in the limit as
,
and for any .
(A sketch proof for the multi-parameter case is given later).
Note 5: The deviance function is random in the sense that it changes over different realisations of the random variables , since it is a function of and the MLE , both of which themselves change over different realisations of .
Suppose that we are interested in inference for a parameter
where for some function of the unknown
parameter .
Example 1.6: IID Poisson data, ctd. We wish to estimate when follows a Poisson distribution. Here
The invariance property of maximum likelihood estimators gives us that
We let denote the true value of , so .
Theorem 3: Asymptotic distribution of a function of the MLE.
Under the regularity conditions, and letting
where is differentiable, then in the limit as
,
A consequence of Theorem 3 is that
so the variability of depends on both the variability of and how sensitive is to changes in (which determines ).
Theorems 1 and 3 lead to the following observations:
The maximum likelihood estimator of is asymptotically unbiased: .
asymptotically achieves the Cramér-Rao bound:
as .
The maximum likelihood estimator of is asymptotically unbiased: .
asymptotically achieves the Cramér-Rao bound:
as .
These are really good properties for an estimator. They tell us that for large there is no better estimator than maximum likelihood for or functions of it.