4 Information and Asymptotics

Information Matrices

This section introduces the notion of information for a multivariate likelihood. We first recall some results from MATH230.

Consider a vector Y=(Y1,,Yd)T. Then Var(Y) is a d×d matrix:

Var(Y)=[Var(Y1)Cov(Y1,Yd)Cov(Yi,Yj)Cov(Yj,Yi)Cov(Yd,Y1)Cov(Yd,Yd)].

The diagonal of the variance matrix consists of the variance of the separate variables and the off-diagonal entries are covariances. The matrix is symmetric and positive definite.

The correlation between variables Yi and Yj is given by:

corr(Yi,Yj)=cov(Yi,Yj)Var(Yi)Var(Yj).

Let Y be a d-dimensional random vector with E(Y)=0 and variance-covariance matrix Σ, and let A be a d×d matrix, then

E(AY)=0

and

Var(AY)=AΣAT.

Notation and Results

As in the one-parameter case, the approximation of log-likelihood or deviance surfaces in the vicinity of θ^ by quadratics — an approximation which improves with increasing sample sizes — forms the basis of an asymptotic distribution theory which can be used to obtain approximate confidence intervals.

The results are similar to those of the one-parameter case. Now, the score function is a vector function, U(θ), defined by

U(θ) = (U1(θ),,Ud(θ))T
= (θ1(θ),,θd(θ))T;

that is, the gradient vector for the log-likelihood. Thus at the MLE we have U(θ^)=0.

The information measures now become matrices:

IO(θ)=[-2θ12(θ)-2θ1θd(θ)-2θiθj(θ)-2θjθi(θ)-2θdθ1(θ)-2θd2(θ)]

and

IE(θ)=[E{-2θ12(θ)}E{-2θ1θd(θ)}E{-2θiθj(θ)}E{-2θjθi(θ)}E{-2θdθ1(θ)}E{-2θd2(θ)}].

These are, respectively, the hessian matrix and expected hessian matrix for the log-likelihood function. A general property of IO(θ^) and IE(θ) (for any θ) is that they are positive definite matrices, measuring respectively the observed curvature at θ^ and the expected curvature, at θ, of the log-likelihood surface.

For the one-parameter case, we had an asymptotic result that said the variance of the MLE was the reciprocal of the Fisher information. The corresponding result for the multi-parameter case is that the variance of the MLE vector is the matrix inverse of the Fisher Information matrix.

In the results that follow, let θ be a d-dimensional unknown parameter and let its true value be θ0.

Proposition: Asymptotic consistency of the MLE.
For regular estimation problems, in the limit as n, if θ0 is the true parameter vector, then

θ^n𝑝θ0.

Theorem 6: Multivariate asymptotic distribution of the MLE vector.
For regular estimation problems, in the limit as n, if θ0 is the true parameter vector, then

θ^MVNd(θ0,IE(θ0)-1).

Thus, the asymptotic distribution of θ^ is multivariate normal, with variance-covariance matrix given by the inverse of the expected information matrix.

For practical problems we use alternatives to IE(θ0) which are asymptotically equivalent:

θ^ MVNd(θ0,IE(θ^)-1)
θ^ MVNd(θ0,IO(θ^)-1).

Theorem 7: Asymptotic distribution of the deviance with a d-dimensional MLE vector.
For a regular estimation problem, the deviance

D(θ)=2[(θ^)-(θ)]

in the limit as n, D(θ0)χd2 and for θθ0 D(θ).