Home page for accesible maths 9 Limit Theorems

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

9.3 The Central Limit Theorem

The Central Limit Theorem is one of the most important results in probability theory and statistics and is the reason the Normal distribution plays such a prominent role. It asserts that the sum (or the mean) of many independent identically distributed random variables is approximately Normally distributed. The remarkable fact is true, whatever the common distribution of the random variables, as long as it has finite expectation and variance.

Theorem 9.3.1.

The Central Limit Theorem. Suppose X1,X2, is a sequence of iid random variables with expectation μ and finite variance σ2, then for any number -<x<,

Pr(n(X¯n-μ)σx)Φ(x),n,

where X¯n=1ni=1nXi and Φ(x) is the cumulative distribution function for the standard Normal distribution, 𝖭(0,1), evaluated at x.

Whereas the WLLN only tells us that X¯n converges to μ the CLT gives us the stronger information that the deviations of X¯n from μ scaled by n follow a N(0,σ2) distribution in the limit. The practical use of this is that for reasonably large n we can assume that

  1. X¯n𝖭(μ,σ2/n),

  2. Sn=i=1nXi𝖭(nμ,nσ2),

approximately.

Example 9.3.1.

A large company claims to pay an average wage of 4 pounds an hour with a standard deviation of 0.50 pounds. A sample of 64 workers were found to have an average wage of 3.90 pounds. Find the probability of observing a sample mean as low as this, or worse, by random chance alone if the company’s claim is true.

Solution.  Let X1,,X64 be the wages in pounds of the 64 workers. If the company’s claim is true these should have expectation 4 and standard deviation 0.50. By the CLT the average X¯64=164i=164Xi satisfies

X¯64N(4,(0.50)264)

approximately. The probability of getting a value of 3.90 or lower in this Normal distribution is

𝖯(X¯643.90) =𝖯(X¯64-4(0.50)2643.9-4(0.50)264)
=Φ(-1.6)=0.0548

which is pnorm(-1.6). There is only around 5% chance of observing such a low average wage for 64 randomly selected workers.

Example 9.3.2.

How large an iid sample should be taken from a normal distribution in order for the probability to be at least 0.99 that the sample mean will be within one standard deviation of the expectation of the distribution? (cf. Example 9.2.1)

Solution.  By symmetry Φ(-a)=1-Φ(a), so

𝖯(|X¯n-μ|<σ) =𝖯(|n(X¯n-μ)σ|<n)
=Φ(n)-Φ(-n)
=2Φ(n)-1.

2Φ(n)-1=0.99 if and only if Φ(n)=0.995. Now Φ-1(0.995)=2.575829= qnorm(0.995). So n>2.575829 or n7 is sufficient .

How large does n have to be for the normal approximation to be valid? This depends on how close the original distribution of the X’s is to normal in the first place - the closer it is the quicker the approximation becomes accurate. Almost always n>30 will be enough to justify the approximation - sometimes much smaller n will do.

Example 9.3.3.

Exam2016 A clumsy robot has been programmed to use a 5 litre bucket to fill a 60 litre tub with water. It fills the bucket at a tap, carries it to the tub and then empties it into the tub. During each trip from the tap to the tub it spills S𝖴𝗇𝗂𝖿(0,2) litres of water from the bucket.

  1. (a)

    Write down the exact probability that the tub is full to the brim after the robot has made 12 trips.

    Solution.  0 (since the robot would have to spill no water on any trip, and even spilling no water on one trip has a probability of 0).

  2. (b)

    Let Wn be the total amount of water in the tub after n trips. Find 𝔼[Wn] and 𝖵𝖺𝗋[Wn] and hence write down an approximate distribution for Wn.

    Solution. 

    1. 𝖤[Wn]=(5-1)×n=4n.

    2. 𝖵𝖺𝗋[Wn]=22/12×n=n/3.

    So WnN(4n,n/3), approximately.

  3. (c)

    Use the approximation in (b) to estimate the probability that the tub is full after 12 trips. Write your answer in terms of Φ, the cdf of the standard normal distribution.

    Solution.  𝖤[W12]=48 and 𝖵𝖺𝗋[W12]=4, so

    𝖯(W12>60)=𝖯(W12-482>60-482)=1-Φ(6)=Φ(-6).
  4. (d)

    Use the following approximate values to comment on the accuracy of the approximation that you used in (c):

    x 1 2 3 4 5 6 7 8
    Φ(-x) 0.159 0.023 1×10-3 3×10-5 3×10-7 1×10-9 1×10-12 6×10-16

    Solution.  The approximation gives 1×10-9 which is close to the truth, zero, so, with an individual uniform distribution even with n as low as 12 the CLT seems to be pretty accurate.

Proof.

The proof of the CLT is not examinable, but we provide a sketch below. For completeness, a formal proof (subject to conditions on the existence of MX(t)), appears in Appendix C. The key simplification below is that we ignore all of the remainder terms from the (two) Taylor expansions; we also ignore the possible non-existence of MX(t) for some t, and we assume that the random variables in the sequence have all been standardised: 𝖤[Xi]=0 and 𝖵𝖺𝗋[Xi]=1.

We will prove the CLT in terms of Sn, i.e. that

𝖯(Sn-nμσnx)Φ(x)

for all x. Part 1 of Theorem 6.4.1 (the MGF theorem) says that the distribution (CDF) of a random variable, X, is uniquely determined by its moment generating function (MGF) MX(t)=𝖤[etX]. That is, if two random variables have the same MGF then they have the same CDF.

MGF of Sn/n:

Let Sn=i=1nXi, then, since the Xi are independent and identically distributed,

MSn(t)=𝖤[exp(ti=1nXi)]=i=1n𝖤[exp(tXi)]=i=1nMXi(t)=MX(t)n.

Hence

MSn/n(t)=𝖤[exp(tSn/n)]=𝖤[exp((t/n)Sn)]=MSn(t/n)=MX(t/n)n.

Or

logMSn/n(t)=nlogMX(t/n). (9.2)

Taylor expansion of logMX(t):

Since X has been standardised,

  1. MX(0)=𝖤[e0X]=1,

  2. MX(0)=𝖤[Xe0X]=𝖤[X]=0,

  3. MX′′(0)=𝖤[X2e0X]=𝖤[X2]=1.

Hence, by Taylor expansion,

MX(t)MX(0)+tMX(0)+12t2MX′′(0)=1+t22.

But log(1+y)y so

logMX(t)log(1+t22)t22.

Limit of logMSn/n(t) as n:

logMX(t/n)12t2n

Thus, using (9.2), logMSn/n(t) is

nlogMX(t/n) n(12t2n)
=12t2.

As n, the approximations, , become exact, as detailed in the appendix. Thus MSn/n(t)et2/2, the mgf of a N(0,1) rv, as required. ∎