Home page for accesible maths 9 Limit Theorems

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

9.2 The Weak Law of Large Numbers

We would like to ascertain when averages of a sequence of IID random variables converge in probability to the expectation, μ, of each random variable in the sequence. We need will need Chebyshev’s inequality, which we derive from a simpler, but more general result, Markov’s inequality.

Proposition 9.2.1.

(Markov’s Inequality) If V is a non-negative random variable then for any a>0,

𝖯(Va)𝖤[V]a.
Proof.

We will assume that V is a continuous random variable; an almost identical proof holds for a discrete rv, with sums instead of integrals.

Let V have density function f(v), then for any a>0,

𝖤[V] =0tf(t)dt
atf(t)dt
aaf(t)dt
=aaf(t)dt
=a𝖯(Va).

Quiz: Why do the inequalities hold? tf(t)0, a>0; ta.

For a random variable Y with 𝖤[Y]=μ, setting V=(Y-μ)2 and a=ϵ2 gives 𝖯((Y-μ)2ϵ2)σ2/ϵ2, i.e.

Corollary 9.2.2.

(Chebyshev’s Inequality) If Y is a random variable with expectation μ and variance σ2< then for any ϵ>0,

𝖯(|Y-μ|>ϵ)σ2ϵ2.
Theorem 9.2.3.

The Weak Law of Large Numbers. Suppose X1,X2, is a sequence of i.i.d. random variables with expectation μ and finite variance σ2 then for any ϵ>0,

𝖯(|X¯n-μ|>ϵ)0,

as n.

Proof.

Using Chebyshev’s inequality we have for any ϵ>0

𝖯(|X¯n-μ|>ϵ)1ϵ2𝖵𝖺𝗋[X¯n]=1ϵ2σ2n0,

as n. ∎

The theorem says what we could observe in Figure 9.1 (Link) namely that the distribution of X¯n concentrates more and more around μ, in the sense that no matter how small an interval [μ-ϵ,μ+ϵ] we take around μ the probability of X¯n falling in this interval tends to 1.

Note: The requirement that σ2< is not necessary. For example it is sufficient that 𝖤[|Xi|]< (but the proof is harder).

Example 9.2.1.

How large a random sample should be taken from a distribution in order for the probability to be at least 0.99 that the sample mean will be within one standard deviation of the expectation of the distribution?

Solution.  As in the proof of the WLLN, by Chebyshev’s inequality,

𝖯(|X¯n-μ|>σ) 1σ2σ2n
=1n.

We need 1n<1-0.99=0.01. So n>100 is sufficient, whatever the distribution.

More general expectations:

Setting Vi=g(Xi) and applying the WLLN to V1,V2, shows us that 1ni=1ng(Xi)𝖤[g(Xi)], at least provided 𝖵𝖺𝗋[g(Xi)]<.

Proportions converge to probabilities:

The WLLN states that averages, X¯n, converge to the expectation, μ. However it also tells us that proportions converge to probabilities. Let A and let I(XiA) indicate whether or not XiA; I is the indicator function – i.e. I(XiA)=1 if XiA and is 0 otherwise.

The proportion of the n rvs X1,,Xn that are in A is

Pn=1ni=1nI(XiA),

so 𝖤[Pn]=𝖤[I(X1A)]. But IX1A𝖡𝖾𝗋𝗇(p) where p=𝖯(X1A), so 𝖤[I(X1A)]=p=𝖯(X1A).

Now 𝖵𝖺𝗋[I(XiA)]< as 0I(XiA)1, so set τ2=𝖵𝖺𝗋[I(XiA)]. Then

𝖯(|Pn-𝖯(X1A)|>ϵ)<τ2nϵ20,

as n.

In particular, with A=(-,x], then we see that the proportion of realisations of X that are x tends to FX(x). As a function of x this proportion is known as the empirical cdf.

With A=(x,x+δ] the proportion of realisations between x and x+δ tends to FX(x+δ)-FX(x)fX(x)δ. As a function of x this proportion is known as the histogram.

Aside:

There is also a Strong Law of Large Numbers, proved by the outstanding Russian mathematician and probabilist Andrei Kolmogorov (1903-1987). It uses a stronger form of convergence (‘almost sure’). Informally, the Strong Law of Large Numbers says that for a sequence X1,X2, of iid random variables, then 𝖤[|X|]< and 𝖤[X]=μ if and only if

X¯nμ,n,

for (almost) all realisations x1,x2, of the sequence X1,X2, (the probability of a ‘bad’ sequence is 0). This is stronger than the WLLN since it concerns the joint distribution of the whole sequence of X¯ns rather than the marginal distributions of each individual X¯n. This is the result that justifies the Monte Carlo simulation approximations to calculate expectations by averages and probabilities by proportions.

It can be shown that if X1,X2, is a sequence of iid random variables with 𝖤[|X1|]=+ then X¯n will not converge to a fixed value for almost any realisation x1,x2, of the sequence X1,X2,.

Example 9.2.2.

If X1,X2, are independent Cauchy random variables it can be shown that the mean X¯n also has the same Cauchy distribution. Hence the distribution of X¯n is the same for all n and thus does not converge to a fixed value.

While the assumption of finite variance could be removed the assumption that the X’s have a finite expectation is necessary for the Law of Large Numbers to hold.