4 Information and Asymptotics

Examples

Example 4.3:  Normal Data, ctd.

Following on from Example 3.3, we have XiN(μ,σ2). Recall that the log-likelihood is

(θ) = i=1n{-12log(2π)-logσ-12σ2(xi-μ)2}
= -n2log(2π)-nlogσ-12σ2i=1n(xi-μ)2.

From Theorem 6, θ^=(μ^,σ^)T is asymptotically

θ^MVN2(θ0,[σ2n00σ22n]),

since

IE(θ)=[nσ2002nσ2] and Var(θ^)=IE(θ)-1=[σ2n00σ22n].

It follows that the standard error for μ^ is σ/n1/2 and that the standard error for σ^ is σ/(2n)1/2.

Since the covariance terms in the variance matrix are zero, the parameters of the model are orthogonal — thus, the standard errors of each of μ^ and σ^ are unaffected by the uncertainty in the value of the other parameter, i.e.

Inference on one parameter is the same whether the other parameter is known or not.

To think about this more explicitly, suppose that σ is known. Then the (one-dimensional) likelihood for μ is

L(μ)=i=1n1(2π)1/2σexp{-12σ2(xi-μ)2}i=1nexp{-12σ2(xi-μ)2},

and so

(μ)=-12σ2i=1n(xi-μ)2.

Thus

(μ)=1σ2i=1n(xi-μ)and′′(μ)=-nσ2,

giving

IE(μ)=nσ2andVar(μ^)=IE(μ)-1=σ2n.

This is what we expected due to the parameters in the bivariate model being orthogonal (off-diagonal terms in the variance matrix being zero).

However, looking again at the Fisher information matrix (below)

IE(θ)=[nσ2002nσ2],

also note that this variance is the reciprocal of the top-left entry in the Fisher information matrix for the bivariate model. Similarly, the if μ were known, then the asymptotic variance of σ^ would be the reciprocal of the bottom-right entry of the Fisher information matrix.

This gives us a quicker way of computing the variance (or standard error) if we assume other parameters are known.

Example 4.4:  Gamma distribution.
Following Example 3.3 we have XiGamma(α,β). From Theorem 3, θ^=(α^,β^)T is asymptotically

θ^MVN2(θ0,Δ-1[nαβ2nβnβnγ(α)]),

where

Δ=(nβ)2(αγ(α)-1)

is the determinant of IE(θ0).

It follows that the standard error of α^ is

(ααγ(α)-1)1/2n-1/2,

and the standard error of β^ is

(β2γ(α)αγ(α)-1)1/2n-1/2.

Because the model is non-orthogonal, the standard error for each parameter is different from that which would have been obtained had the other parameter been known.

For example, if α were known, then the standard error for β^ would have been

(β2nα)1/2={[E(-2(β)β2)]-1}1/2.

Thus, in this example, we see that in non-orthogonal models, inferences for each parameter are affected by the uncertainty in the value of other parameters.

Finally as

corr(α^,β^)=cov(α^,β^)Var(α^)Var(β^)=1αγ(α)

the correlation between α^ and β^ is positive which is consistent with the likelihood contours seen in Figure 7.

Example 4.5:  Linear Regression with unknown variance.

Now consider the simple linear regression model with σ unknown. Let XiN(α+βzi,σ2) with (known) explanatory variables (z1,,zn), i.e. θ=(α,β,σ).

Similar to Example 3.9, we have

(θ) = i=1n{-12log(2π)-logσ-12σ2(xi-α-βzi)2}
= -n2log(2π)-nlogσ-12σ2i=1n(xi-α-βzi)2.

By looking at the pairwise contour plots of parameters, the following observations were made:

  • The (α,β) likelihood has near-elliptical contours, with principal axes not parallel to the coordinate axes.

  • The likelihoods for (α,σ) and (β,σ) are also near-elliptical, but with axes parallel to the coordinate axes.

The score function is made up of the components:

α(θ) = 1σ2i=1n(xi-α-βzi)
β(θ) = 1σ2i=1n(xi-α-βzi)zi
σ(θ) = -nσ+1σ3i=1n(xi-α-βzi)2.

The second partial derivatives are:

2α2(θ) = -nσ2
2β2(θ) = -i=1nzi2σ2
 2σ2(θ) = nσ2-3σ4i=1n(xi-α-βzi)2
2αβ(θ) = 2βα(θ)=-i=1nziσ2
2ασ(θ) = 2σα(θ)=-2σ3i=1n(xi-α-βzi)
2βσ(θ) = 2σβ(θ)=-2σ3i=1n(xi-α-βzi)zi.

Thus the Fisher information matrix is

IE(θ)=[nσ2i=1nziσ20i=1nziσ2i=1nzi2σ20002nσ2].

Taking the inverse, we have the variance matrix as

[2nσ6(ni=1nzi2-(i=1nzi)2)]-1[2ni=1nzi2σ4-2ni=1nziσ40-2ni=1nziσ42n2σ4000ni=1nzi2-(i=1nzi)2σ4],

which simplifies to

σ22nΔ[2ni=1nzi2-2ni=1nzi0-2ni=1nzi2n2000Δ],

where Δ=ni=1nzi2-(i=1nzi)2.

The asymptotic variance of the third estimator, σ^, is σ22n.

This variance is the same as in the previous example. This should not be a surprise, since it not correlated with the other two estimators, and thus inference using this estimator is not influenced by that of the others.

Exercise 6: What are the asymptotic variances of α^ and β^?

Exercise 7: What is the asymptotic correlation between α^ and β^, corr(α^,β^)?

Exercise 8: If the other parameters were known, what would the standard error of β^ be?

Exercise 9: If the other parameters were known, what would the standard error of α^ be?