4 Information and Asymptotics Information Matrices Discussion

Sketch Proofs of Results

We will now prove Theorem 6 and Theorem 7. It is important to follow the steps involved to help your understanding of likelihood concepts.

Lemma 8: Multivariate Central Limit Theorem.
Suppose that $\vec{Y}$ is a $d$ -dimensional vector variable with mean vector $\vec{\mu}$ and variance-covariance matrix $\Sigma$ with finite values on the diagonal. If $\vec{Y}_{1},\ldots,\vec{Y}_{n}$ is an IID sequence of vector random variables having the same distribution as $\vec{Y}$ , and if

\vec{S}_{n}=\sum_{i=1}^{n}\vec{Y}_{i}

(meaning a vector componentwise sum) then

n^{-1/2}[\vec{S}_{n}-E(\vec{S}_{n})]=n^{-1/2}[\vec{S}_{n}-n\vec{\mu}]\sim\mbox% {MVN}_{d}({\bf 0},\Sigma)

n\rightarrow\infty.

Lemma 9: If $\vec{Y}\sim\mbox{MVN}_{d}({\bf 0},\Sigma)$ then $\vec{Y}^{T}\Sigma^{-1}\vec{Y}\sim\chi_{d}^{2}$ .

Proof.

First recall that if RVs $X_{1},\ldots,X_{d}$ are iid $N(0,1)$ , then $X_{1}^{2}+\ldots+X_{d}^{2}\sim\chi^{2}_{d}$ by definition.

Letting $\vec{X}=\Sigma^{-1/2}\vec{Y}$ , we have

•

$E(\vec{X})=E(\Sigma^{-1/2}\vec{Y})={\bf 0}$

(vector linear combination of zero mean r.v. – using result at start of chapter)
•

$\textrm{Var}(\vec{X})=\textrm{Var}(\Sigma^{-1/2}\vec{Y})=\Sigma^{-1/2}\textrm{% Var}(\vec{Y})(\Sigma^{-1/2})^{T}=\Sigma^{-1/2}\Sigma(\Sigma^{-1/2})^{T}=\vec{I% }_{d}$

(using variance result at start of chapter)
•

(vector) linear combination of normal r.v. is also normal

So $\vec{X}\sim\mbox{MVN}_{d}(\vec{0},\vec{I})$ .

Hence $\vec{Y}^{T}\Sigma^{-1}\vec{Y}=\vec{X}^{T}\vec{X}=X_{1}^{2}+\ldots+X_{d}^{2}% \sim\chi_{d}^{2}$ . ∎

Lemma 10: Asymptotic distribution of the true score.
Under the regularity conditions,

•

$E\{\vec{U}(\vec{\theta_{0}})\}={\bf 0}$
•

$\mbox{Var}\{\vec{U}(\vec{\theta_{0}})\}=E\{\vec{I}_{O}(\vec{\theta_{0}})\}=% \vec{I}_{E}(\vec{\theta_{0}})$ .
•

Asymptotically as $n\rightarrow\infty$ , $\vec{U}(\vec{\theta_{0}})\sim N(\vec{0},\vec{I}_{E}(\vec{\theta_{0}}))$ .

Example 4.1: Normal Data, ctd.
$X_{i}\sim N(\mu,\sigma^{2})$ , so $\vec{\theta}=(\mu,\sigma)$ . In this case,

\vec{U}(\vec{\theta})=\left\{\frac{1}{\sigma^{2}}\sum_{i=1}^{n}(x_{i}-\mu),-% \frac{n}{\sigma}+\frac{1}{\sigma^{3}}\sum_{i=1}^{n}(x_{i}-\mu)^{2}\right\}^{T}

and

\vec{I}_{O}(\vec{\theta})=\left[\begin{array}[]{cc}\frac{n}{\sigma^{2}}&\frac{% 2}{\sigma^{3}}\sum(x_{i}-\mu)\\ \frac{2}{\sigma^{3}}\sum(x_{i}-\mu)&\frac{3}{\sigma^{4}}\sum(x_{i}-\mu)^{2}-% \frac{n}{\sigma^{2}}\end{array}\right]

\vec{I}_{O}(\hat{\vec{\theta}})=\left[\begin{array}[]{cc}\frac{n}{\hat{\sigma}% ^{2}}&0\\ 0&\frac{2n}{\hat{\sigma}^{2}}\end{array}\right]\mbox{ and }\vec{I}_{O}(\hat{\vec{\theta}})^{-1}=\left[\begin{array}[]{cc}\frac{\hat{% \sigma}^{2}}{n}&0\\ 0&\frac{\hat{\sigma}^{2}}{2n}\end{array}\right].

Since

E\left\{\sum_{i=1}^{n}(X_{i}-\mu)\right\}=\sum_{i=1}^{n}\{E(X_{i})-\mu\}=0

and

E\left\{\sum_{i=1}^{n}(X_{i}-\mu)^{2}\right\}=\sum_{i=1}^{n}E\{(X_{i}-\mu)^{2}% \}=n\sigma^{2},

it follows that

\vec{I}_{E}(\vec{\theta})=\left[\begin{array}[]{cc}\frac{n}{\sigma^{2}}&0\\ 0&\frac{2n}{\sigma^{2}}\end{array}\right]\mbox{ and }\vec{I}_{E}(\vec{\theta})% ^{-1}=\left[\begin{array}[]{cc}\frac{\sigma^{2}}{n}&0\\ 0&\frac{\sigma^{2}}{2n}\end{array}\right].

Example 4.2: Gamma Distribution, ctd.
$X_{i}\sim\mbox{Gamma}(\alpha,\beta)$ . In this case

\vec{U}(\vec{\theta})=\left\{n\log\beta+\sum_{i=1}^{n}\log x_{i}-n\gamma(% \alpha),\frac{n\alpha}{\beta}-\sum_{i=1}^{n}x_{i}\right\}^{T},

where as before $\gamma(\alpha):=\frac{\Gamma^{\prime}(\alpha)}{\Gamma(\alpha)}$ and

\vec{I}_{O}(\vec{\theta})=\vec{I}_{E}(\vec{\theta})=\left[\begin{array}[]{cc}n% \gamma^{\prime}(\alpha)&~{}~{}~{}-n/\beta\\ -n/\beta&~{}~{}~{}n\alpha/\beta^{2}\end{array}\right].

It follows that

I_{O}(\vec{\theta})^{-1}=I_{E}(\vec{\theta})^{-1}=\Delta^{-1}\left[\begin{% array}[]{cc}\frac{n\alpha}{\beta^{2}}&\frac{n}{\beta}\\ \frac{n}{\beta}&n\gamma^{\prime}(\alpha)\end{array}\right]

where the determinant $\Delta$ is

\Delta=\left(\frac{n}{\beta}\right)^{2}\left(\alpha\gamma^{\prime}(\alpha)-1% \right).

Notice that in Example 4.4 $\vec{I}_{E}(\vec{\hat{\theta}})=\vec{I}_{O}(\vec{\hat{\theta}})$ , and in Example 4.4 $\vec{I}_{E}=\vec{I}_{O}$ , but this is the exception rather than the rule.