Home page for accesible maths 9 Limit Theorems 9.2 The Weak Law of Large Numbers 9.4 Key definitions and Relationships

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

9.3 The Central Limit Theorem

The Central Limit Theorem is one of the most important results in probability theory and statistics and is the reason the Normal distribution plays such a prominent role. It asserts that the sum (or the mean) of many independent identically distributed random variables is approximately Normally distributed. The remarkable fact is true, whatever the common distribution of the random variables, as long as it has finite expectation and variance.

Theorem 9.3.1.

The Central Limit Theorem. Suppose $X_{1},X_{2},\ldots$ is a sequence of iid random variables with expectation $\mu$ and finite variance $\sigma^{2}$ , then for any number $-\infty<x<\infty$ ,

\displaystyle\Pr\left(\frac{\sqrt{n}(\bar{X}_{n}-\mu)}{\sigma}\leq x\right)\to% \Phi(x),\ n\to\infty,

where $\bar{X}_{n}=\frac{1}{n}\sum_{i=1}^{n}X_{i}$ and $\Phi(x)$ is the cumulative distribution function for the standard Normal distribution, $\operatorname{\mathsf{N}}(0,1)$ , evaluated at $x$ .

Whereas the WLLN only tells us that $\bar{X}_{n}$ converges to $\mu$ the CLT gives us the stronger information that the deviations of $\overline{X}_{n}$ from $\mu$ scaled by $\sqrt{n}$ follow a $N(0,\sigma^{2})$ distribution in the limit. The practical use of this is that for reasonably large $n$ we can assume that

$\bar{X}_{n}\sim\operatorname{\mathsf{N}}(\mu,\sigma^{2}/n)$ ,
$S_{n}=\sum_{i=1}^{n}X_{i}\sim{\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{N}% }(n\mu,n\sigma^{2}),}$

approximately.

Example 9.3.1.

A large company claims to pay an average wage of $4$ pounds an hour with a standard deviation of $0.50$ pounds. A sample of $64$ workers were found to have an average wage of $3.90$ pounds. Find the probability of observing a sample mean as low as this, or worse, by random chance alone if the company’s claim is true.

Solution. Let $X_{1},\ldots,X_{64}$ be the wages in pounds of the $64$ workers. If the company’s claim is true these should have expectation $4$ and standard deviation $0.50$ . By the CLT the average $\bar{X}_{64}=\frac{1}{64}\sum_{i=1}^{64}X_{i}$ satisfies

\displaystyle\bar{X}_{64}\sim{\color[rgb]{0.76,0.01,0}N\left(4,\frac{(0.50)^{2% }}{64}\right)}

approximately. The probability of getting a value of $3.90$ or lower in this Normal distribution is

	$\displaystyle\operatorname{\mathsf{P}}\left({\bar{X}_{64}\leq 3.90}\right)$	$\displaystyle={\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{P}}\left({\frac{% \bar{X}_{64}-4}{\sqrt{\frac{(0.50)^{2}}{64}}}\leq\frac{3.9-4}{\sqrt{\frac{(0.5% 0)^{2}}{64}}}}\right)}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\Phi(-1.6)=0.0548}$

which is pnorm(-1.6). There is only around ${\color[rgb]{0.76,0.01,0}5}\%$ chance of observing such a low average wage for $64$ randomly selected workers.

Example 9.3.2.

How large an iid sample should be taken from a normal distribution in order for the probability to be at least 0.99 that the sample mean will be within one standard deviation of the expectation of the distribution? (cf. Example 9.2.1)

Solution. By symmetry $\Phi(-a)=1-\Phi(a)$ , so

	$\displaystyle\operatorname{\mathsf{P}}\left({\|\bar{X}_{n}-\mu\|<\sigma}\right)$	$\displaystyle=\operatorname{\mathsf{P}}\left({\left\|\frac{\sqrt{n}(\bar{X}_{n}% -\mu)}{\sigma}\right\|<\sqrt{n}}\right)$
		$\displaystyle=\Phi(\sqrt{n})-\Phi(-\sqrt{n})$
		$\displaystyle=2\Phi(\sqrt{n})-1.$

$2\Phi(\sqrt{n})-1=0.99$ if and only if $\Phi(\sqrt{n})=0.995$ . Now ${\color[rgb]{0.76,0.01,0}\Phi^{-1}(0.995)=2.575829=}$ qnorm(0.995). So ${\color[rgb]{0.76,0.01,0}\sqrt{n}>2.575829}$ or ${\color[rgb]{0.76,0.01,0}n\geq 7}$ is sufficient .

How large does $n$ have to be for the normal approximation to be valid? This depends on how close the original distribution of the $X$ ’s is to normal in the first place - the closer it is the quicker the approximation becomes accurate. Almost always $n>30$ will be enough to justify the approximation - sometimes much smaller $n$ will do.

Example 9.3.3.

Exam2016 A clumsy robot has been programmed to use a $5$ litre bucket to fill a $60$ litre tub with water. It fills the bucket at a tap, carries it to the tub and then empties it into the tub. During each trip from the tap to the tub it spills $S\sim{\operatorname{\mathsf{Unif}}}(0,2)$ litres of water from the bucket.

(a)

Write down the exact probability that the tub is full to the brim after the robot has made $12$ trips.

Solution. 0 (since the robot would have to spill no water on any trip, and even spilling no water on one trip has a probability of 0).
(b)

Let $W_{n}$ be the total amount of water in the tub after $n$ trips. Find $\mathbb{E}[W_{n}]$ and ${\operatorname{\mathsf{Var}}}\left[{W_{n}}\right]$ and hence write down an approximate distribution for $W_{n}$ .
Solution.
1. $\operatorname{\mathsf{E}}\left[{W_{n}}\right]={\color[rgb]{0.76,0.01,0}(5-1)% \times n=4n.}$
2. ${\operatorname{\mathsf{Var}}}\left[{W_{n}}\right]={\color[rgb]{0.76,0.01,0}2^{% 2}/12\times n=n/3.}$
So $W_{n}\sim{\color[rgb]{0.76,0.01,0}N(4n,n/3)}$ , approximately.
(c)

Use the approximation in (b) to estimate the probability that the tub is full after $12$ trips. Write your answer in terms of $\Phi$ , the cdf of the standard normal distribution.

Solution. $\operatorname{\mathsf{E}}\left[{W_{12}}\right]=48$ and ${\operatorname{\mathsf{Var}}}\left[{W_{12}}\right]=4$ , so

$\displaystyle\operatorname{\mathsf{P}}\left({W_{12}>60}\right)={\color[rgb]{% 0.76,0.01,0}\operatorname{\mathsf{P}}\left({\frac{W_{12}-48}{2}>\frac{60-48}{2% }}\right)=1-\Phi(6)=\Phi(-6).}$
(d)

Use the following approximate values to comment on the accuracy of the approximation that you used in (c):

$x$ 1 2 3 4 5 6 7 8

$\Phi(-x)$ 0.159 0.023 $1\times 10^{-3}$ $3\times 10^{-5}$ $3\times 10^{-7}$ $1\times 10^{-9}$ $1\times 10^{-12}$ $6\times 10^{-16}$

Solution. The approximation gives ${\color[rgb]{0.76,0.01,0}1\times 10^{-9}}$ which is close to the truth, zero, so, with an individual uniform distribution even with ${\color[rgb]{0.76,0.01,0}n}$ as low as $12$ the CLT seems to be pretty accurate.

$x$	1	2	3	4	5	6	7	8
$\Phi(-x)$	0.159	0.023	$1\times 10^{-3}$	$3\times 10^{-5}$	$3\times 10^{-7}$	$1\times 10^{-9}$	$1\times 10^{-12}$	$6\times 10^{-16}$

Proof.

The proof of the CLT is not examinable, but we provide a sketch below. For completeness, a formal proof (subject to conditions on the existence of $M_{X}(t)$ ), appears in Appendix C. The key simplification below is that we ignore all of the remainder terms from the (two) Taylor expansions; we also ignore the possible non-existence of $M_{X}(t)$ for some $t$ , and we assume that the random variables in the sequence have all been standardised: $\operatorname{\mathsf{E}}\left[{X_{i}}\right]=0$ and ${\operatorname{\mathsf{Var}}}\left[{X_{i}}\right]=1$ .

We will prove the CLT in terms of $S_{n}$ , i.e. that

\displaystyle\operatorname{\mathsf{P}}\left({\frac{S_{n}-n\mu}{\sigma\sqrt{n}}% \leq{x}}\right)\to\Phi(x)

for all $x\in\mathbb{R}$ . Part 1 of Theorem 6.4.1 (the MGF theorem) says that the distribution (CDF) of a random variable, $X$ , is uniquely determined by its moment generating function (MGF) $M_{X}(t)=\operatorname{\mathsf{E}}\left[{e^{tX}}\right]$ . That is, if two random variables have the same MGF then they have the same CDF.

MGF of $S_{n}/\sqrt{n}$ :

Let $S_{n}=\sum_{i=1}^{n}X_{i}$ , then, since the $X_{i}$ are independent and identically distributed,

\displaystyle M_{S_{n}}\left(t\right)=\operatorname{\mathsf{E}}\left[{\exp% \left(t\sum_{i=1}^{n}X_{i}\right)}\right]=\prod_{i=1}^{n}\operatorname{\mathsf% {E}}\left[{\exp(tX_{i})}\right]=\prod_{i=1}^{n}M_{X_{i}}(t)=M_{X}(t)^{n}.

Hence

\displaystyle M_{S_{n}/\sqrt{n}}(t)=\operatorname{\mathsf{E}}\left[{\exp(tS_{n% }/\sqrt{n})}\right]=\operatorname{\mathsf{E}}\left[{\exp((t/\sqrt{n})S_{n})}% \right]=M_{S_{n}}(t/\sqrt{n})=M_{X}(t/\sqrt{n})^{n}.

\displaystyle\log M_{S_{n}/\sqrt{n}}(t)=n\log M_{X}(t/\sqrt{n}).

(9.2)

Taylor expansion of $\log M_{X}(t)$ :

Since $X$ has been standardised,

$M_{X}(0)=\operatorname{\mathsf{E}}\left[{e^{0X}}\right]=1$ ,
$M_{X}^{\prime}(0)=\operatorname{\mathsf{E}}\left[{Xe^{0X}}\right]=% \operatorname{\mathsf{E}}\left[{X}\right]=0$ ,
$M_{X}^{\prime\prime}(0)=\operatorname{\mathsf{E}}\left[{X^{2}e^{0X}}\right]=% \operatorname{\mathsf{E}}\left[{X^{2}}\right]=1$ .

Hence, by Taylor expansion,

\displaystyle M_{X}(t)\approx M_{X}(0)+tM^{\prime}_{X}(0)+\frac{1}{2}t^{2}M^{% \prime\prime}_{X}(0)=1+\frac{t^{2}}{2}.

But $\log(1+y)\approx y$ so

\displaystyle\log M_{{X}}(t)\approx\log\left(1+\frac{t^{2}}{2}\right)\approx% \frac{t^{2}}{2}.

Limit of $\log M_{S_{n}/\sqrt{n}}(t)$ as $n\to\infty$ :

\displaystyle\log M_{X}(t/\sqrt{n})\approx\frac{1}{2}\frac{t^{2}}{n}

Thus, using (9.2), $\log M_{S_{n}/\sqrt{n}}(t)$ is

	$\displaystyle n\log M_{X}(t/\sqrt{n})$	$\displaystyle\approx n\left(\frac{1}{2}\frac{t^{2}}{n}\right)$
		$\displaystyle=\frac{1}{2}t^{2}.$

As $n\to\infty$ , the approximations, $\approx$ , become exact, as detailed in the appendix. Thus $M_{S_{n}/\sqrt{n}}(t)\to e^{t^{2}/2}$ , the mgf of a $N(0,1)$ rv, as required. ∎