5 Parameter Functions 5 Parameter Functions Examples

Asymptotic Results

Obtaining a point estimate of $\phi$ is straightforward due to the invariance property of the likelihood function: if $\hat{\vec{\theta}}$ is the maximum likelihood estimator of $\vec{\theta}$ , then $\hat{\phi}=g(\hat{\vec{\theta}})$ is the maximum likelihood estimator of $\phi$ .

For finding confidence intervals, as in previous chapters, there are two basic techniques, each based on asymptotic approximation: the first uses the asymptotic normality of the maximum likelihood estimator; the second is based on the asymptotic $\chi^{2}$ distribution of the deviance function.

Theorem 11: Asymptotic distribution of a function of the MLE, multiparameter case.
For a regular estimation problem, if $\phi_{0}=g(\vec{\theta_{0}})$ is the true parameter value, then as $n\rightarrow\infty$ :

\hat{\phi}\sim N\bigg{(}\phi_{0},\bigtriangledown g(\vec{\theta_{0}})^{T}\vec{% I}_{E}(\vec{\theta_{0}})^{-1}\bigtriangledown g(\vec{\theta_{0}})\bigg{)},

where

\bigtriangledown g(\vec{\theta_{0}})=\left\{\frac{\partial}{\partial\theta_{1}% }g(\vec{\theta_{0}}),\ldots,\frac{\partial}{\partial\theta_{d}}g(\vec{\theta_{% 0}})\right\}^{T}.

As a consequence we have

\mbox{Var}(\hat{\phi})=\mbox{Var}(g(\hat{\vec{\theta}}))\approx% \bigtriangledown g(\vec{\theta_{0}})^{T}\vec{I}_{E}(\vec{\theta_{0}})^{-1}% \bigtriangledown g(\vec{\theta_{0}}).

For practical problems we use alternatives to $\vec{I}_{E}(\vec{\theta_{0}})$ and $g(\vec{\theta_{0}})$ which are asymptotically equivalent:

	$\displaystyle\hat{\phi}$	$\displaystyle\sim$	$\displaystyle\mbox{N}\bigg{(}\phi_{0},\bigtriangledown g(\vec{\hat{\theta}})^{% T}\vec{I}_{O}(\vec{\hat{\theta}})^{-1}\bigtriangledown g(\vec{\hat{\theta}})% \bigg{)}$
	$\displaystyle\hat{\phi}$	$\displaystyle\sim$	$\displaystyle\mbox{N}\bigg{(}\phi_{0},\bigtriangledown g(\vec{\hat{\theta}})^{% T}\vec{I}_{E}(\vec{\hat{\theta}})^{-1}\bigtriangledown g(\vec{\hat{\theta}})% \bigg{)}.$

When $d=1$ , $\bigtriangledown g(\vec{\theta_{0}})=g^{\prime}(\theta_{0})$ and $\vec{I}_{E}(\vec{\theta_{0}})^{-1}$ is a scalar ( $1\times 1$ matrix), so that

\mbox{Var}(\hat{\phi})=g^{\prime}(\theta_{0})I_{E}(\theta_{0})^{-1}g^{\prime}(% \theta_{0})=[g^{\prime}(\theta_{0})]^{2}I_{E}(\theta_{0})^{-1}

Thus, Theorem 11 is a multi-parameter generalisation of that in Theorem 3.

We omit the proof (which follows similar lines to the proof of Theorem 6).

Theorem 11 leads to a very simple method of constructing confidence intervals: an approximate $(1-\alpha)\times 100\%$ confidence interval for $\phi$ is

(\hat{\phi}-z_{\alpha/2}\mbox{se}(\hat{\phi}),\hat{\phi}+z_{\alpha/2}\mbox{se}% (\hat{\phi})),

where se $(\hat{\phi})=[\mbox{Var}(\hat{\phi})]^{1/2}$ .

This method of finding confidence intervals is relatively easy, but suffers the same drawback of the corresponding method in previous chapters: the interval will be symmetric about $\hat{\phi}$ , and this may lead to some values included in the interval having lower likelihood than some of the excluded values.

An alternative approach is based on the deviance function. In this case, however, the treatment of the nuisance parameters is rather different.

Theorem 12: For a regular estimation problem, if $\phi_{0}=g(\vec{\theta_{0}})$ is the true parameter value, then as $n\rightarrow\infty$ :

D^{*}(\phi_{0})=2[\ell(\hat{\vec{\theta}})-P\ell(\phi_{0})]\sim\chi_{1}^{2},

where $D^{*}$ is the profile deviance function.

$P\ell(\phi)$ , termed the profile log-likelihood, is defined by

P\ell(\phi)=\max_{\vec{\lambda}}\ell(\phi,\vec{\lambda}).

If $\phi\not=\phi_{0}$ then $D^{*}(\phi)\rightarrow\infty$ as $n\rightarrow\infty$ .

Thus, to evaluate the profile (log)likelihood, first the value of $\phi$ is fixed, and then the likelihood is maximised over all possible values of $\vec{\lambda}$ . Thus, $P\ell(\phi)$ is the skyline of the log-likelihood surface viewed looked from the $\phi$ axis.

Theorem 12 then tells us that the corresponding profile deviance function has an approximate $\chi_{1}^{2}$ distribution, and this, in turn, can form the basis of confidence interval construction. To evaluate the profile deviance two maximisations of the likelihood are required, one over all of $\vec{\theta}$ and the other over $\vec{\lambda}$ holding $\phi$ fixed.