4 Markov chains

4.9 N-step formula for transition matrices - intro.

Example 4.9.1.

Define

P:=160(29823844823829)

and set

A=16(2132-2021-3),B=16(2221-2130-3)andD=(10003/50001/10).

Show that P=ADB and that AB=I, and hence find Pn.

Firstly

ADB = 16(2132-2021-3)(2223/5-6/53/53/100-3/10)
= 16(2+3/5+3/102-6/52+3/5-3/102-6/52+12/52-6/52+3/5-3/102-6/52+3/5+3/10)
= 16(29/104/523/104/522/54/523/104/529/10)=P.

Next AB is

16(2132-2021-3)(2221-2130-3)=16(600060006)=I.

We have therefore shown that B=A-1, and hence P=ADA-1. Hence

Pn=(ADA-1)n=ADA-1ADA-1ADA-1=ADnA-1.

Therefore

Pn = 16(2132-2021-3)(10003/50001/10)n16(2221-2130-3)
= 16(2132-2021-3)(1000(3/5)n000(1/10)n)(2221-2130-3)
= 16(2132-2021-3)(222(3/5)n-2(3/5)n(3/5)n3(1/10)n0-3(1/10)n)
= 16(2+(3/5)n+3(1/10)n2-2(3/5)n2+(3/5)n-3(1/10)n2-2(3/5)n2+4(3/5)n2-2(3/5)n2+(3/5)n-3(1/10)n2-2(3/5)n2+(3/5)n+3(1/10)n).
Remark.
  1. (a)

    As n, (3/5)n0 and (1/10)n0 so

    Pn(1/31/31/31/31/31/31/31/31/3)=(πππ).

    - but we could have worked this out simply by showing that the stationary distribution was (1/3,1/3,1/3)!

  2. (b)

    The formula tells us how quickly the Markov chain converges to its stationary distribution, i.e. the rate of convergence. Very quickly (1/10)n(3/5)n and so the main discrepancy from the stationary distribution is due to terms in (3/5)n.

4.9.1 MATH103 revision: eigenvectors and eigenvalues

Definition 4.9.2.

A matrix M is defined to have a left eigenvector e with eigenvalue λ when eM=λe.

If e is a left eigenvector of M with eigenvalue λ then so is ce for any c0.

If P is a TPM with invariant distribution π then π is a left eigenvector with eigenvalue 1, since πP=π.

Suppose that M=ADA-1 with

D=(λ1000λ2000λm).
Set   ei:=(000100)

(i.e. ei has a single 1 in the ith column with the other entries zero) and note that

eiD=(000λi00)=λiei.

Now eiA-1 is a left eigenvector of M and its eigenvalue is λi since

eiA-1M=eiA-1ADA-1=eiDA-1=λieiA-1.

We will not be explicitly interested in the eigenvectors of P (except, of course, for π), but we will use the eigenvalues of P, λ1,,λm, through the decomposition P=ADA-1.

4.9.2 Remarks on eigenvalues (based on the Perron-Frobenius Theorem)

  • (i)

    It can be shown that because the entries in any row of a TPM add up to 1, |λi|1 for all i.

  • (ii)

    Further, if P is such that the Markov chain has an asymptotic distribution then π is the only eigenvector for which the eigenvalue has modulus 1. For all other eigenvectors, |λi|<1.

  • (iii)

    Eigenvalues can be complex (i.e. have real and imaginary parts); we will not be dealing with such cases, but extension is straightforward.

  • (iv)

    It is usual to set λ1=1 and to arrange the other eigenvalues in order of decreasing magnitude; i.e. 1=λ1|λ2||λ3||λm|.

4.9.3 The rate of convergence to the asymptotic distribution

In the example we showed that

P:=160(29823844823829)=ADA-1withD=(10003/50001/10)

so the eigenvalues of P are 1,3/5, and 1/10.

We then found that

Pn=(ADA-1)n=ADnA-1.

As n,

Dn=(1000(3/5)n000(1/10)n)(100000000).

For any Markov chain with an asymptotic distribution, all eigenvalues except the first have |λi|<1 and so λin0 as n (i2). The above limit for Dn therefore holds for any such Markov chain.

Quickly (1/10)n(3/5)n and so the main discrepancy between

Pn=A(1000(3/5)n000(1/10)n)A-1and(πππ)=A(100000000)A-1

is due to terms in (3/5)n (check back to the formula for the n-step transition matrix). In general the biggest discrepancy from the asymptotic distribution, π, is due to the second largest (in modulus) eigenvalue, λ2.

|λ2| is called the geometric rate of convergence of the Markov chain. The larger |λ2| the slower the Markov chain is to converge.

NB: If the eigenvalues were 1,-3/5,1/10 then the geometric rate of convergence would still be 3/5.