1 Modelling and Statistical Inference Maximum likelihood estimation and relative likelihood 2 Hypothesis Tests and Confidence Intervals

Asymptotic Theory

We have discussed that $\hat{\theta}$ has a distribution (called the sampling distribution). Theorem 1 in this section establishes that the sampling distribution is asymptotically normal. We can also calculate the asymptotic (sampling) distribution of any function of $\hat{\theta}$ , and one that is particularly useful is the deviance of the true parameter, $D(\theta_{0})$ , which Theorem 2 establishes has a $\chi^{2}_{1}$ distribution.

The asymptotic results stated below hold subject to some weak regularity conditions. These are:

R1: $\theta_{0}$ is an interior point of $\Omega$ with the support of $f(x|\theta)$ not depend on $\theta$ .
R2: No two different values of $\theta$ give the same $f(x|\theta)$ for all $x$ .
R3: The first three derivatives of $\ell(\theta)$ exist in a neighbourhood of $\theta_{0}$ .

The first of these conditions ensures that the boundary of the sampling space does not depend on the unknown parameter (excluding the Uniform $[0,\theta]$ model, for example).

The second ensures that different values of $\theta$ give rise to different probability models.

The third ensures that the likelihood is sufficiently smooth close to $\theta_{0}$ to enable Taylor series approximations to work well.

Conditions R2 and R3 hold for all the models we will see in this course and we will assume they are true without checking for their validity. We will consider cases where R1 does not hold.

Results for $\theta$

Theorem 1: Asymptotic distribution of the MLE.
Under the regularity conditions, in the limit as $n\rightarrow\infty$ ,

\sqrt{I_{E}(\theta_{0})}(\hat{\theta}-\theta_{0})\sim N(0,1).

(A sketch proof for the multi-parameter case is given later).

Note 1: Theorem 1 gives the asymptotic distribution of $\hat{\theta}$ . It is usually interpreted (and remembered) as

\hat{\theta}\sim N(\theta_{0},[I_{E}(\theta_{0})]^{-1})

for large sample sizes, $n$ , with Var $(\hat{\theta})=I_{E}(\theta_{0})^{-1}$ and standard error, $se(\hat{\theta})=I_{E}(\theta_{0})^{-1/2}$ .

Note 2: As $I_{E}(\theta_{0})$ is increasing with $n$ this result tells us that $\hat{\theta}\rightarrow\theta_{0}$ as $n\rightarrow\infty$ and it gives how $\hat{\theta}$ varies about $\theta_{0}$ for large $n$ .

Note 3: Any term asymptotically (as $n\rightarrow\infty$ ) equivalent to $I_{E}(\theta)$ can be used in Theorem 1 instead, so as $n\rightarrow\infty$ ,

$\displaystyle\sqrt{I_{E}(\hat{\theta})}(\hat{\theta}-\theta_{0})$	$\displaystyle\sim$	$\displaystyle N(0,1)$
$\displaystyle\sqrt{I_{O}(\theta_{0})}(\hat{\theta}-\theta_{0})$	$\displaystyle\sim$	$\displaystyle N(0,1)$
$\displaystyle\sqrt{I_{O}(\hat{\theta})}(\hat{\theta}-\theta_{0})$	$\displaystyle\sim$	$\displaystyle N(0,1).$

Though the asymptotic results are unaffected by such changes, the speed of convergence to the asymptotic limits are different, so the accuracy of the approximations for finite values of $n$ will also be different.

Note 4: As $\theta_{0}$ is unknown we tend only to use $I_{E}(\hat{\theta})$ and $I_{O}(\hat{\theta})$ in practice. Thus:

The approximation to the asymptotic distribution using expected information we use is

\hat{\theta}\sim N(\theta_{0},I_{E}(\hat{\theta})^{-1})

and the approximation to the asymptotic distribution using observed information we use is

\hat{\theta}\sim N(\theta_{0},I_{O}(\hat{\theta})^{-1}).

Theorem 2: Asymptotic distribution of the deviance.
Under the regularity conditions, in the limit as $n\rightarrow\infty$ ,

D(\theta_{0})=2[\ell(\hat{\theta})-\ell(\theta_{0})]\sim\chi_{1}^{2},

and for any $\theta\not=\theta_{0}$ $D(\theta)\rightarrow\infty$ .

(A sketch proof for the multi-parameter case is given later).

Note 5: The deviance function is random in the sense that it changes over different realisations of the random variables $\vec{X}=X_{1},\ldots,X_{n}$ , since it is a function of $\ell(\theta)$ and the MLE $\hat{\theta}$ , both of which themselves change over different realisations of $\vec{X}$ .

Results for functions of $\theta$

Suppose that we are interested in inference for a parameter $\phi$ where $\phi=g(\theta)$ for some function $g$ of the unknown parameter $\theta$ .

Example 1.6: IID Poisson data, ctd. We wish to estimate $P(X=0)$ when $X$ follows a Poisson $(\theta)$ distribution. Here

\phi=P(X=0|\theta)=e^{-\theta}=g(\theta).

The invariance property of maximum likelihood estimators gives us that

\hat{\phi}=g(\hat{\theta}).

We let $\phi_{0}$ denote the true value of $\phi$ , so $\phi_{0}=g(\theta_{0})$ .

Theorem 3: Asymptotic distribution of a function of the MLE.
Under the regularity conditions, and letting $\phi=g(\theta)$ where $g$ is differentiable, then in the limit as $n\rightarrow\infty$ ,

\hat{\phi}\sim N(\phi_{0},[g^{\prime}(\theta_{0})]^{2}[I_{E}(\theta_{0})]^{-1}).

A consequence of Theorem 3 is that

\mbox{Var}(\hat{\phi})\approx[g^{\prime}(\hat{\theta})]^{2}\mbox{Var}(\hat{% \theta}),

so the variability of $\hat{\phi}$ depends on both the variability of $\hat{\theta}$ and how sensitive $\phi$ is to changes in $\theta$ (which determines $g^{\prime}$ ).

Theorems 1 and 3 lead to the following observations:

•

The maximum likelihood estimator $\hat{\theta}$ of $\theta$ is asymptotically unbiased: $E(\hat{\theta})\rightarrow\theta_{0}$ .
•

$\hat{\theta}$ asymptotically achieves the Cramér-Rao bound:

$\mbox{Var}(\hat{\theta})\rightarrow[I_{E}(\theta_{0})]^{-1}$

as $n\rightarrow\infty$ .
•

The maximum likelihood estimator $\hat{\phi}$ of $\phi=g(\theta)$ is asymptotically unbiased: $E\{g(\hat{\theta})\}\rightarrow\phi_{0}=g(\theta_{0})$ .
•

$\hat{\phi}=g(\hat{\theta})$ asymptotically achieves the Cramér-Rao bound:

$\mbox{Var}(g(\hat{\theta}))\rightarrow[g^{\prime}(\theta_{0})]^{2}[I_{E}(% \theta_{0})]^{-1}$

as $n\rightarrow\infty$ .

These are really good properties for an estimator. They tell us that for large $n$ there is no better estimator than maximum likelihood for $\theta$ or functions of it.

Asymptotic Theory

Results for θ

Results for functions of θ

Results for $\theta$

Results for functions of $\theta$