1 Modelling and Statistical Inference

Asymptotic Theory

We have discussed that θ^ has a distribution (called the sampling distribution). Theorem 1 in this section establishes that the sampling distribution is asymptotically normal. We can also calculate the asymptotic (sampling) distribution of any function of θ^, and one that is particularly useful is the deviance of the true parameter, D(θ0), which Theorem 2 establishes has a χ12 distribution.

The asymptotic results stated below hold subject to some weak regularity conditions. These are:

R1

θ0 is an interior point of Ω with the support of f(x|θ) not depend on θ.

R2

No two different values of θ give the same f(x|θ) for all x.

R3

The first three derivatives of (θ) exist in a neighbourhood of θ0.

The first of these conditions ensures that the boundary of the sampling space does not depend on the unknown parameter (excluding the Uniform[0,θ] model, for example).

The second ensures that different values of θ give rise to different probability models.

The third ensures that the likelihood is sufficiently smooth close to θ0 to enable Taylor series approximations to work well.

Conditions R2 and R3 hold for all the models we will see in this course and we will assume they are true without checking for their validity. We will consider cases where R1 does not hold.

Results for θ

Theorem 1: Asymptotic distribution of the MLE.
Under the regularity conditions, in the limit as n,

IE(θ0)(θ^-θ0)N(0,1).

(A sketch proof for the multi-parameter case is given later).

Note 1: Theorem 1 gives the asymptotic distribution of θ^. It is usually interpreted (and remembered) as

θ^N(θ0,[IE(θ0)]-1)

for large sample sizes, n, with Var(θ^)=IE(θ0)-1 and standard error, se(θ^)=IE(θ0)-1/2.

Note 2: As IE(θ0) is increasing with n this result tells us that θ^θ0 as n and it gives how θ^ varies about θ0 for large n.

Note 3: Any term asymptotically (as n) equivalent to IE(θ) can be used in Theorem 1 instead, so as n,

IE(θ^)(θ^-θ0) N(0,1)
IO(θ0)(θ^-θ0) N(0,1)
IO(θ^)(θ^-θ0) N(0,1).

Though the asymptotic results are unaffected by such changes, the speed of convergence to the asymptotic limits are different, so the accuracy of the approximations for finite values of n will also be different.

Note 4: As θ0 is unknown we tend only to use IE(θ^) and IO(θ^) in practice. Thus:

The approximation to the asymptotic distribution using expected information we use is

θ^N(θ0,IE(θ^)-1)

and the approximation to the asymptotic distribution using observed information we use is

θ^N(θ0,IO(θ^)-1).

Theorem 2: Asymptotic distribution of the deviance.
Under the regularity conditions, in the limit as n,

D(θ0)=2[(θ^)-(θ0)]χ12,

and for any θθ0 D(θ).

(A sketch proof for the multi-parameter case is given later).

Note 5: The deviance function is random in the sense that it changes over different realisations of the random variables X=X1,,Xn, since it is a function of (θ) and the MLE θ^, both of which themselves change over different realisations of X.

Results for functions of θ

Suppose that we are interested in inference for a parameter ϕ where ϕ=g(θ) for some function g of the unknown parameter θ.

Example 1.6:  IID Poisson data, ctd. We wish to estimate P(X=0) when X follows a Poisson(θ) distribution. Here

ϕ=P(X=0|θ)=e-θ=g(θ).

The invariance property of maximum likelihood estimators gives us that

ϕ^=g(θ^).

We let ϕ0 denote the true value of ϕ, so ϕ0=g(θ0).

Theorem 3: Asymptotic distribution of a function of the MLE.
Under the regularity conditions, and letting ϕ=g(θ) where g is differentiable, then in the limit as n,

ϕ^N(ϕ0,[g(θ0)]2[IE(θ0)]-1).

A consequence of Theorem 3 is that

Var(ϕ^)[g(θ^)]2Var(θ^),

so the variability of ϕ^ depends on both the variability of θ^ and how sensitive ϕ is to changes in θ (which determines g).

Theorems 1 and 3 lead to the following observations:

  • The maximum likelihood estimator θ^ of θ is asymptotically unbiased: E(θ^)θ0.

  • θ^ asymptotically achieves the Cramér-Rao bound:

    Var(θ^)[IE(θ0)]-1

    as n.

  • The maximum likelihood estimator ϕ^ of ϕ=g(θ) is asymptotically unbiased: E{g(θ^)}ϕ0=g(θ0).

  • ϕ^=g(θ^) asymptotically achieves the Cramér-Rao bound:

    Var(g(θ^))[g(θ0)]2[IE(θ0)]-1

    as n.

These are really good properties for an estimator. They tell us that for large n there is no better estimator than maximum likelihood for θ or functions of it.