5 Parameter Functions Asymptotic Results 6 Model Choice

Examples

Example 5.3: Normal data, $\phi=\mu$ .

$X_{i}\sim N(\mu,\sigma^{2})$ . We saw previously that $\hat{\mu}=\bar{x}$ and se $({\hat{\mu}})=\sigma/n^{1/2}$ . Find the two forms of confidence interval for $\mu$ .

The asymptotic normality approach gives a $(1-\alpha)\times 100\%$ confidence interval for $\mu$ as follows.

We have already obtained that

\vec{I}_{E}(\vec{\theta})^{-1}=\left[\begin{array}[]{cc}\frac{\sigma^{2}}{n}&0% \\ 0&\frac{\sigma^{2}}{2n}\end{array}\right].

Here $\phi=g(\vec{\theta})=\mu$ . Then, $\hat{\phi}=\hat{\mu}$ and $\bigtriangledown g(\vec{\theta})^{T}=[1,0]$ , so

\mbox{Var}(\hat{\phi})=[1,0]\left[\begin{array}[]{cc}\frac{\sigma^{2}}{n}&0\\ 0&\frac{\sigma^{2}}{2n}\end{array}\right]\left[\begin{array}[]{c}1\\ 0\end{array}\right]=\frac{\sigma^{2}}{n}.

Therefore the confidence interval is

\left(\bar{x}-z_{\alpha/2}\frac{\hat{\sigma}}{\sqrt{n}},\bar{x}+z_{\alpha/2}% \frac{\hat{\sigma}}{\sqrt{n}}\right),

where

\hat{\sigma}=\left(\frac{1}{n}\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}\right)^{1/2},

is the maximum likelihood estimate of $\sigma$ .

Alternatively, we can derive confidence intervals using the profile deviance. The first step is to evaluate the profile log-likelihood for $\mu$ . Recall,

\ell(\mu,\sigma)=-\frac{n}{2}\log(2\pi)-n\log\sigma-\frac{1}{2\sigma^{2}}\sum_% {i=1}^{n}(x_{i}-\mu)^{2}.

Fixing $\mu$ and maximising with respect to $\sigma$ , the maximum occurs at $\hat{\sigma}_{\mu}$ , where

\frac{\partial\ell(\mu,\hat{\sigma}_{\mu})}{\partial\sigma}=0.

Thus,

0=-\frac{n}{\hat{\sigma}_{\mu}}+\frac{1}{\hat{\sigma}_{\mu}^{3}}\sum_{i=1}^{n}% (x_{i}-\mu)^{2},

and so

\hat{\sigma}_{\mu}=\left(\frac{1}{n}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\right)^{1/2}.

Hence,

$\displaystyle D^{*}(\mu)$	$\displaystyle=$	$\displaystyle-2n\log(\hat{\sigma}/\hat{\sigma}_{\mu})+\frac{1}{\hat{\sigma}_{% \mu}^{2}}\sum_{i=1}^{n}(x_{i}-\mu)^{2}-\frac{1}{\hat{\sigma}^{2}}\sum_{i=1}^{n% }(x_{i}-\hat{\mu})^{2}$
	$\displaystyle=$	$\displaystyle-2n\log(\hat{\sigma}/\hat{\sigma}_{\mu})+\frac{n\hat{\sigma}_{\mu% }^{2}}{\hat{\sigma}_{\mu}^{2}}-\frac{n\hat{\sigma}^{2}}{\hat{\sigma}^{2}}$
	$\displaystyle=$	$\displaystyle-2n\log(\hat{\sigma}/\hat{\sigma}_{\mu}).$

For a simulated dataset of size $n=25$ with $\mu=0,~{}\sigma=1$ , the profile deviance is plotted in the left panel of Figure Figure LABEL:nproflik (Link), leading to a 95% confidence interval of $(-0.12,0.48)$ for $\mu$ .

Figure 5.1: First Link, Second Link, Caption: Left: Profile deviance for

\mu

for simulated normal data. Right: Contour plot of log-likelihood surface for simulated normal data, with the path

(\mu,\hat{\sigma}_{\mu})

of the profile likelihood for

\mu

super-imposed.

The connection between profile likelihood and likelihood is shown in the right panel of the figure. In this figure contours of the 2-dimensional likelihood surface are plotted, together with the path $(\mu,\hat{\sigma}_{\mu})$ of the profile likelihood for $\mu$ .

Example 5.4: Normal data, other $\phi$ s.

$X_{i}\sim N(\mu,\sigma^{2})$ . We have already obtained that

\vec{I}_{E}(\vec{\theta})^{-1}=\left[\begin{array}[]{cc}\frac{\sigma^{2}}{n}&0% \\ 0&\frac{\sigma^{2}}{2n}\end{array}\right].

1.

If $\phi=g(\vec{\theta})=\sigma^{2}$ . Then, $\hat{\phi}=\hat{\sigma}^{2}$ and $\bigtriangledown g(\vec{\theta})^{T}=(0,2\sigma)$ , so

$\mbox{Var}(\hat{\phi})=[0,2\sigma]\left[\begin{array}[]{cc}\frac{\sigma^{2}}{n% }&0\\ 0&\frac{\sigma^{2}}{2n}\end{array}\right]\left[\begin{array}[]{c}0\\ 2\sigma\end{array}\right]=\frac{2\sigma^{4}}{n}.$
2.

If $\phi=g(\vec{\theta})=\mu+\sigma\Phi^{-1}(1-p)$ , with $p$ known.

Then, $\hat{\phi}=\hat{\mu}+\hat{\sigma}\Phi^{-1}(1-p)$ and $\bigtriangledown g(\vec{\theta})^{T}=[1,\Phi^{-1}(1-p)]$ , so,

$\displaystyle\mbox{Var}(\hat{\phi})$ $\displaystyle=$ $\displaystyle[1,\Phi^{-1}(1-p)]\left[\begin{array}[]{cc}\frac{\sigma^{2}}{n}&0% \\ 0&\frac{\sigma^{2}}{2n}\end{array}\right]\left[\begin{array}[]{c}1\\ \Phi^{-1}(1-p)\end{array}\right]$

$\displaystyle=$ $\displaystyle\frac{\sigma^{2}}{2n}(2+[\Phi^{-1}(1-p)]^{2}).$

In each case an approximate $(1-\alpha)$ confidence interval is obtained as

\hat{\phi}\pm z_{\alpha/2}\sqrt{\mbox{Var}(\hat{\phi}}).

Example 5.5: Gamma Distribution, ctd.

$X_{i}\sim\mbox{Gamma}(\alpha,\beta)$ and $\phi=\alpha/\beta$ , the population mean. Thus, $\hat{\phi}=\hat{\alpha}/\hat{\beta}$ and $\bigtriangledown g(\vec{\theta})^{T}=[1/\beta,-\alpha/\beta^{2}]$ . Hence,

\mbox{Var}(\hat{\phi})\approx[1/\beta,-\alpha/\beta^{2}]\vec{I}_{E}(\vec{% \theta})^{-1}\left[\begin{array}[]{c}1/\beta\\ -\alpha/\beta^{2}\end{array}\right]

where

\vec{I}_{E}(\vec{\theta})^{-1}=\Delta^{-1}\left[\begin{array}[]{cc}\frac{n% \alpha}{\beta^{2}}&\frac{n}{\beta}\\ \frac{n}{\beta}&n\gamma^{\prime}(\alpha)\end{array}\right],

and

\Delta=\left(\frac{n}{\beta}\right)^{2}\left(\alpha\gamma^{\prime}(\alpha)-1% \right).

Example 5.6: Simple Linear Regression, ctd.

$X_{i}\sim N(\alpha+\beta z_{i},\sigma^{2})$ with (known) explanatory variables $(z_{1},\ldots,z_{n})$ and $\sigma=1$ also known. We obtained,

\ell(\vec{\theta})=-\frac{n}{2}\log(2\pi)-n\log\sigma-\frac{1}{2\sigma^{2}}% \sum_{i=1}^{n}(x_{i}-\alpha-\beta z_{i})^{2}.

Now, suppose we wish to obtain a confidence interval for $\beta$ based on the profile-likelihood function. We have, for fixed $\beta$ and $\sigma=1$ ,

\frac{\partial\ell}{\partial\alpha}=\sum(x_{i}-\alpha-\beta z_{i})=0

at the maximum, and so

\hat{\alpha}_{\beta}=\bar{x}-\beta\bar{z}.

Recall also that

\hat{\alpha}=\bar{x}-\hat{\beta}\bar{z}.

Hence, we obtain

D^{*}(\beta)=2\left\{-\frac{1}{2}\sum(x_{i}-\hat{\alpha}-\hat{\beta}z_{i})^{2}% +\frac{1}{2}\sum(x_{i}-\hat{\alpha}_{\beta}-\beta z_{i})^{2}\right\}.

For the data simulated in Figure 3.7, the corresponding profile deviance for $\beta$ is plotted in Figure Figure 5.2 (Link), leading to a 95% confidence interval for $\beta$ equal to $(0.10,0.21)$ .

Figure 5.2: Link, Caption: Profile deviance for

\beta

in regression example.

This latter example illustrates an important use of the likelihood function for model discrimination. Suppose we are interested in assessing whether there is a ‘significant’ linear relationship between the observations $x_{i}$ and the explanatory variables $z_{i}$ .

If there were no such relationship then the true value of $\beta$ would be zero.

Hence, a test of the linear regression model, versus the simpler assumption that each of the responses have a common mean, is equivalent to the test that $\beta=0$ .

Since 0 falls outside of our 95% confidence interval for $\beta$ , this gives reasonably strong evidence that $\beta\neq 0$ ; we say that the hypothesis $\beta=0$ is rejected at the 5% significance level. But could we have reached this conclusion without having to go through the entire procedure of producing the profile deviance function?

In fact, yes. All that is necessary is to evaluate the profile likelihood at $\beta=0$ (equivalent to fixing $\beta=0$ and maximizing the log-likelihood with respect to the other parameters), and compare this value with the maximum likelihood value under the complete model where $\beta$ is unconstrained. Doubling the difference gives the deviance, whose value can be compared with the $\chi^{2}_{1}$ distribution to check for significance.

	$\displaystyle\mbox{Var}(\hat{\phi})$	$\displaystyle=$	$\displaystyle[1,\Phi^{-1}(1-p)]\left[\begin{array}[]{cc}\frac{\sigma^{2}}{n}&0% \\ 0&\frac{\sigma^{2}}{2n}\end{array}\right]\left[\begin{array}[]{c}1\\ \Phi^{-1}(1-p)\end{array}\right]$
		$\displaystyle=$	$\displaystyle\frac{\sigma^{2}}{2n}(2+[\Phi^{-1}(1-p)]^{2}).$