Home page for accesible maths MATH103: Probability

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

8.3 Weak law of large numbers

Recall from Exercise 5.13 that if an experiment is repeated n times then, as n gets large, the proportion of times an event A occurs converges to P(A). We will now prove a similar result concerning the average of several realisations of a random variable converging to the expected value. We start with a lemma which is proved in MATH230.

Lemma 8.9.

Let X1,X2,,Xn be jointly distributed random variables with finite expectation and variance. Then

  • E(X1+X2++Xn)=E(X1)+E(X2)++E(Xn), and

  • if X1,X2,,Xn are independent then

    Var(X1+X2++Xn)=Var(X1)+Var(X2)++Var(Xn).

Now suppose that X1,X2,,Xn are independent copies of a random variable X. For example, suppose we repeated an experiment n times, and Xi is the measured outcome on the ith experiment. This setup means that for each i we have

E[Xi] = E[X]
Var(Xi) = Var(X)

If we want to report a value, scientists will usually measure it n times and report the average measured value. Let Xi be the measured value on the ith experiment. The average measured value is

X¯=1n(X1+X2++Xn).

Why do we do this?

Let’s consider the properties of X¯. For simplicity, write μ for E[X] and σ2 for Var(X).

E[X¯] = E[1n(X1+X2++Xn)]
= 1nE[X1+X2++Xn]  by linearity of E
= 1n{E[X1]+E[X2]++E[Xn]}  by Lemma 8.9
= 1n{E[X]+E[X]++E[X]}  sinceE[Xi]=E[X]
= 1n{nμ}
= μ

So X¯ has expectation the quantity we wish to report, the true expected value of X. Of course, simply reporting the first measurement X1 would also have this expected value.

Consider now the variance of X¯:

Var(X¯) = Var(1n(X1+X2++Xn))
= 1n2Var(X1+X2++Xn)  by the calculation on p4.5
= 1n2{Var(X1)+Var(X2)++Var(Xn)}  by Lemma 8.9
= 1n2{nσ2}
= σ2n

The variance of our reported quantity, X¯, decreases as the number of measurements n increases.

We can use Chebychev’s inequality (Section 4.6) to be more precise about this. Recall that for any random variable R with expected value μ and standard deviation s

P(|R-μ|>cs)1c2,

for any c>0.

[I am using s for the standard deviation here, instead of σ, to avoid confusion with the σ2 already used for the variance of X.]

Hence for the random variable X¯ with expected value μ, variance σ2/n and hence standard deviation σ/n, we have

P(|X¯-μ|>cσn)1c2

By taking k=c/n, we can rearrange this expression to

P(|X¯-μ|>kσ)1k2n.

We see that as n gets large, the probability that the sample average X¯ is more than distance kσ away from the expected value of the original random quantity X decreases to 0.

Since k is arbitrary, in some sense we can say that X¯ converges to μ. This is called the weak law of large numbers. You will see various other forms of convergence of random variables in later courses.

One final thing to note: the standard deviation σ is exactly the right quantity for determining the appropriate scale for measuring distance here: the events are of the type “random variable is more than k standard deviations away from the mean”.