4 Information and Asymptotics Orthogonality 5 Parameter Functions

Examples

Example 4.3: Normal Data, ctd.

Following on from Example 3.3, we have $X_{i}\sim N(\mu,\sigma^{2})$ . Recall that the log-likelihood is

	$\displaystyle\ell(\vec{\theta})$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\left\{-\frac{1}{2}\log(2\pi)-\log\sigma-\frac{1}{2% \sigma^{2}}(x_{i}-\mu)^{2}\right\}$
		$\displaystyle=$	$\displaystyle-\frac{n}{2}\log(2\pi)-n\log\sigma-\frac{1}{2\sigma^{2}}\sum_{i=1% }^{n}(x_{i}-\mu)^{2}.$

From Theorem 6, $\hat{\vec{\theta}}=(\hat{\mu},\hat{\sigma})^{T}$ is asymptotically

\hat{\vec{\theta}}\sim\mbox{MVN}_{2}\left(\vec{\theta_{0}},\left[\begin{array}% []{cc}\frac{\sigma^{2}}{n}&0\\ 0&\frac{\sigma^{2}}{2n}\end{array}\right]\right),

since

\vec{I}_{E}(\vec{\theta})=\left[\begin{array}[]{cc}\frac{n}{\sigma^{2}}&0\\ 0&\frac{2n}{\sigma^{2}}\end{array}\right]\mbox{ and }\mbox{Var}(\hat{\vec{% \theta}})=\vec{I}_{E}(\vec{\theta})^{-1}=\left[\begin{array}[]{cc}\frac{\sigma% ^{2}}{n}&0\\ 0&\frac{\sigma^{2}}{2n}\end{array}\right].

It follows that the standard error for $\hat{\mu}$ is $\sigma/n^{1/2}$ and that the standard error for $\hat{\sigma}$ is $\sigma/(2n)^{1/2}$ .

Since the covariance terms in the variance matrix are zero, the parameters of the model are orthogonal — thus, the standard errors of each of $\hat{\mu}$ and $\hat{\sigma}$ are unaffected by the uncertainty in the value of the other parameter, i.e.

Inference on one parameter is the same whether the other parameter is known or not.

To think about this more explicitly, suppose that $\sigma$ is known. Then the (one-dimensional) likelihood for $\mu$ is

L(\mu)=\prod_{i=1}^{n}\frac{1}{(2\pi)^{1/2}\sigma}\exp\left\{-\frac{1}{2\sigma% ^{2}}(x_{i}-\mu)^{2}\right\}\propto\prod_{i=1}^{n}\exp\left\{-\frac{1}{2\sigma% ^{2}}(x_{i}-\mu)^{2}\right\},

and so

\ell(\mu)=-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(x_{i}-\mu)^{2}.

Thus

\ell^{\prime}(\mu)=\frac{1}{\sigma^{2}}\sum_{i=1}^{n}(x_{i}-\mu)\quad\mbox{and% }\quad\ell^{\prime\prime}(\mu)=-\frac{n}{\sigma^{2}},

giving

I_{E}(\mu)=\frac{n}{\sigma^{2}}\quad\mbox{and}\quad\mbox{Var}(\hat{\mu})=I_{E}% (\mu)^{-1}=\frac{\sigma^{2}}{n}.

This is what we expected due to the parameters in the bivariate model being orthogonal (off-diagonal terms in the variance matrix being zero).

However, looking again at the Fisher information matrix (below)

\vec{I}_{E}(\vec{\theta})=\left[\begin{array}[]{cc}\frac{n}{\sigma^{2}}&0\\ 0&\frac{2n}{\sigma^{2}}\end{array}\right],

also note that this variance is the reciprocal of the top-left entry in the Fisher information matrix for the bivariate model. Similarly, the if $\mu$ were known, then the asymptotic variance of $\hat{\sigma}$ would be the reciprocal of the bottom-right entry of the Fisher information matrix.

This gives us a quicker way of computing the variance (or standard error) if we assume other parameters are known.

Example 4.4: Gamma distribution.
Following Example 3.3 we have $X_{i}\sim\mbox{Gamma}(\alpha,\beta)$ . From Theorem 3, $\hat{\vec{\theta}}=(\hat{\alpha},\hat{\beta})^{T}$ is asymptotically

\hat{\vec{\theta}}\sim\mbox{MVN}_{2}\left(\vec{\theta_{0}},~{}\Delta^{-1}\left% [\begin{array}[]{cc}\frac{n\alpha}{\beta^{2}}&\frac{n}{\beta}\\ \frac{n}{\beta}&n\gamma^{\prime}(\alpha)\end{array}\right]\right),

where

\Delta=\left(\frac{n}{\beta}\right)^{2}\left(\alpha\gamma^{\prime}(\alpha)-1\right)

is the determinant of $\vec{I}_{E}(\vec{\theta_{0}})$ .

It follows that the standard error of $\hat{\alpha}$ is

\left(\frac{\alpha}{\alpha\gamma^{\prime}(\alpha)-1}\right)^{1/2}n^{-1/2},

and the standard error of $\hat{\beta}$ is

\left(\frac{\beta^{2}\gamma^{\prime}(\alpha)}{\alpha\gamma^{\prime}(\alpha)-1}% \right)^{1/2}n^{-1/2}.

Because the model is non-orthogonal, the standard error for each parameter is different from that which would have been obtained had the other parameter been known.

For example, if $\alpha$ were known, then the standard error for $\hat{\beta}$ would have been

\left(\frac{\beta^{2}}{n\alpha}\right)^{1/2}=\left\{\left[E\left(-\frac{% \partial^{2}\ell(\beta)}{\partial\beta^{2}}\right)\right]^{-1}\right\}^{1/2}.

Thus, in this example, we see that in non-orthogonal models, inferences for each parameter are affected by the uncertainty in the value of other parameters.

Finally as

\mbox{corr}(\hat{\alpha},\hat{\beta})=\frac{\mbox{cov}(\hat{\alpha},\hat{\beta% })}{\sqrt{\mbox{Var}(\hat{\alpha})\mbox{Var}(\hat{\beta})}}=\frac{1}{\sqrt{% \alpha\gamma^{\prime}(\alpha)}}

the correlation between $\hat{\alpha}$ and $\hat{\beta}$ is positive which is consistent with the likelihood contours seen in Figure 7.

Example 4.5: Linear Regression with unknown variance.

Now consider the simple linear regression model with $\sigma$ unknown. Let $X_{i}\sim N(\alpha+\beta z_{i},\sigma^{2})$ with (known) explanatory variables $(z_{1},\ldots,z_{n})$ , i.e. $\vec{\theta}=(\alpha,\beta,\sigma)$ .

Similar to Example 3.9, we have

	$\displaystyle\ell(\vec{\theta})$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\left\{-\frac{1}{2}\log(2\pi)-\log\sigma-\frac{1}{2% \sigma^{2}}(x_{i}-\alpha-\beta z_{i})^{2}\right\}$
		$\displaystyle=$	$\displaystyle-\frac{n}{2}\log(2\pi)-n\log\sigma-\frac{1}{2\sigma^{2}}\sum_{i=1% }^{n}(x_{i}-\alpha-\beta z_{i})^{2}.$

By looking at the pairwise contour plots of parameters, the following observations were made:

•

The $(\alpha,\beta)$ likelihood has near-elliptical contours, with principal axes not parallel to the coordinate axes.
•

The likelihoods for $(\alpha,\sigma)$ and $(\beta,\sigma)$ are also near-elliptical, but with axes parallel to the coordinate axes.

The score function is made up of the components:

$\displaystyle\frac{\partial}{\partial\alpha}\ell(\vec{\theta})$	$\displaystyle=$	$\displaystyle\frac{1}{\sigma^{2}}\sum_{i=1}^{n}(x_{i}-\alpha-\beta z_{i})$
$\displaystyle\frac{\partial}{\partial\beta}\ell(\vec{\theta})$	$\displaystyle=$	$\displaystyle\frac{1}{\sigma^{2}}\sum_{i=1}^{n}(x_{i}-\alpha-\beta z_{i})z_{i}$
$\displaystyle\frac{\partial}{\partial\sigma}\ell(\vec{\theta})$	$\displaystyle=$	$\displaystyle-\frac{n}{\sigma}+\frac{1}{\sigma^{3}}\sum_{i=1}^{n}(x_{i}-\alpha% -\beta z_{i})^{2}.$

The second partial derivatives are:

$\displaystyle\frac{\partial^{2}}{\partial\alpha^{2}}\ell(\vec{\theta})$	$\displaystyle=$	$\displaystyle-\frac{n}{\sigma^{2}}$

$\displaystyle\frac{\partial^{2}}{\partial\beta^{2}}\ell(\vec{\theta})$	$\displaystyle=$	$\displaystyle-\frac{\sum_{i=1}^{n}z_{i}^{2}}{\sigma^{2}}$
$\displaystyle\ \frac{\partial^{2}}{\partial\sigma^{2}}\ell(\vec{\theta})$	$\displaystyle=$	$\displaystyle\frac{n}{\sigma^{2}}-\frac{3}{\sigma^{4}}\sum_{i=1}^{n}(x_{i}-% \alpha-\beta z_{i})^{2}$
$\displaystyle\frac{\partial^{2}}{\partial\alpha\partial\beta}\ell(\vec{\theta})$	$\displaystyle=$	$\displaystyle\frac{\partial^{2}}{\partial\beta\partial\alpha}\ell(\vec{\theta}% )=-\frac{\sum_{i=1}^{n}z_{i}}{\sigma^{2}}$
$\displaystyle\frac{\partial^{2}}{\partial\alpha\partial\sigma}\ell(\vec{\theta})$	$\displaystyle=$	$\displaystyle\frac{\partial^{2}}{\partial\sigma\partial\alpha}\ell(\vec{\theta% })=-\frac{2}{\sigma^{3}}\sum_{i=1}^{n}(x_{i}-\alpha-\beta z_{i})$
$\displaystyle\frac{\partial^{2}}{\partial\beta\partial\sigma}\ell(\vec{\theta})$	$\displaystyle=$	$\displaystyle\frac{\partial^{2}}{\partial\sigma\partial\beta}\ell(\vec{\theta}% )=-\frac{2}{\sigma^{3}}\sum_{i=1}^{n}(x_{i}-\alpha-\beta z_{i})z_{i}.$

Thus the Fisher information matrix is

\vec{I}_{E}(\vec{\theta})=\left[\begin{array}[]{ccc}\frac{n}{\sigma^{2}}&\frac% {\sum_{i=1}^{n}z_{i}}{\sigma^{2}}&0\\ \frac{\sum_{i=1}^{n}z_{i}}{\sigma^{2}}&\frac{\sum_{i=1}^{n}z_{i}^{2}}{\sigma^{% 2}}&0\\ 0&0&\frac{2n}{\sigma^{2}}\end{array}\right].

Taking the inverse, we have the variance matrix as

\left[\frac{2n}{\sigma^{6}}\left(n\sum_{i=1}^{n}z_{i}^{2}-\left(\sum_{i=1}^{n}% z_{i}\right)^{2}\right)\right]^{-1}\left[\begin{array}[]{ccc}\frac{2n\sum_{i=1% }^{n}z_{i}^{2}}{\sigma^{4}}&-\frac{2n\sum_{i=1}^{n}z_{i}}{\sigma^{4}}&0\\ -\frac{2n\sum_{i=1}^{n}z_{i}}{\sigma^{4}}&\frac{2n^{2}}{\sigma^{4}}&0\\ 0&0&\frac{n\sum_{i=1}^{n}z_{i}^{2}-\left(\sum_{i=1}^{n}z_{i}\right)^{2}}{% \sigma^{4}}\end{array}\right],

which simplifies to

\frac{\sigma^{2}}{2n\Delta}\left[\begin{array}[]{ccc}2n\sum_{i=1}^{n}z_{i}^{2}% &-2n\sum_{i=1}^{n}z_{i}&0\\ -2n\sum_{i=1}^{n}z_{i}&2n^{2}&0\\ 0&0&\Delta\end{array}\right],

where $\Delta=n\sum_{i=1}^{n}z_{i}^{2}-\left(\sum_{i=1}^{n}z_{i}\right)^{2}$ .

The asymptotic variance of the third estimator, $\hat{\sigma}$ , is $\frac{\sigma^{2}}{2n}$ .

This variance is the same as in the previous example. This should not be a surprise, since it not correlated with the other two estimators, and thus inference using this estimator is not influenced by that of the others.

Exercise 6: What are the asymptotic variances of $\hat{\alpha}$ and $\hat{\beta}$ ?

Exercise 7: What is the asymptotic correlation between $\hat{\alpha}$ and $\hat{\beta}$ , $\mbox{corr}(\hat{\alpha},\hat{\beta})$ ?

Exercise 8: If the other parameters were known, what would the standard error of $\hat{\beta}$ be?

Exercise 9: If the other parameters were known, what would the standard error of $\hat{\alpha}$ be?