Home page for accesible maths 9 Limit Theorems Illustration for other distributions through simulation 9.3 The Central Limit Theorem

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

9.2 The Weak Law of Large Numbers

We would like to ascertain when averages of a sequence of IID random variables converge in probability to the expectation, $\mu$ , of each random variable in the sequence. We need will need Chebyshev’s inequality, which we derive from a simpler, but more general result, Markov’s inequality.

Proposition 9.2.1.

(Markov’s Inequality) If $V$ is a non-negative random variable then for any $a>0$ ,

\displaystyle\operatorname{\mathsf{P}}\left({V\geq a}\right)\leq\frac{% \operatorname{\mathsf{E}}\left[{V}\right]}{a}.

Proof.

We will assume that $V$ is a continuous random variable; an almost identical proof holds for a discrete rv, with sums instead of integrals.

Let $V$ have density function $f(v)$ , then for any $a>0$ ,

	$\displaystyle\operatorname{\mathsf{E}}\left[{V}\right]$	$\displaystyle=\int_{0}^{\infty}tf(t)\,\mathrm{d}t$
		$\displaystyle\geq\int_{a}^{\infty}tf(t)\,\mathrm{d}t$
		$\displaystyle\geq\int_{a}^{\infty}af(t)\,\mathrm{d}t$
		$\displaystyle=a\int_{a}^{\infty}f(t)\,\mathrm{d}t$
		$\displaystyle=a\operatorname{\mathsf{P}}\left({V\geq a}\right).$

∎

Quiz: Why do the inequalities hold? $tf(t)\geq 0$ , $a>0$ ; $t\geq a$ .

For a random variable $Y$ with $\operatorname{\mathsf{E}}\left[{Y}\right]=\mu$ , setting $V=(Y-\mu)^{2}$ and $a=\epsilon^{2}$ gives $\operatorname{\mathsf{P}}\left({(Y-\mu)^{2}\geq\epsilon^{2}}\right)\leq\sigma^% {2}/\epsilon^{2}$ , i.e.

Corollary 9.2.2.

(Chebyshev’s Inequality) If $Y$ is a random variable with expectation $\mu$ and variance $\sigma^{2}<\infty$ then for any $\epsilon>0$ ,

\displaystyle\operatorname{\mathsf{P}}\left({|Y-\mu|>\epsilon}\right)\leq\frac% {\sigma^{2}}{\epsilon^{2}}.

Theorem 9.2.3.

The Weak Law of Large Numbers. Suppose $X_{1},X_{2},\ldots$ is a sequence of i.i.d. random variables with expectation $\mu$ and finite variance $\sigma^{2}$ then for any $\epsilon>0$ ,

\displaystyle\operatorname{\mathsf{P}}\left({|\bar{X}_{n}-\mu|>\epsilon}\right% )\to 0,

as $n\to\infty$ .

Proof.

Using Chebyshev’s inequality we have for any $\epsilon>0$

\displaystyle\operatorname{\mathsf{P}}\left({|\bar{X}_{n}-\mu|>\epsilon}\right% )\leq\frac{1}{\epsilon^{2}}{\operatorname{\mathsf{Var}}}\left[{\bar{X}_{n}}% \right]=\frac{1}{\epsilon^{2}}\frac{\sigma^{2}}{n}\to 0,

as $n\to\infty$ . ∎

The theorem says what we could observe in Figure 9.1 (Link) namely that the distribution of $\bar{X}_{n}$ concentrates more and more around $\mu$ , in the sense that no matter how small an interval $[\mu-\epsilon,\mu+\epsilon]$ we take around $\mu$ the probability of $\bar{X}_{n}$ falling in this interval tends to $1$ .

Note: The requirement that $\sigma^{2}<\infty$ is not necessary. For example it is sufficient that $\operatorname{\mathsf{E}}\left[{|X_{i}|}\right]<\infty$ (but the proof is harder).

Example 9.2.1.

How large a random sample should be taken from a distribution in order for the probability to be at least 0.99 that the sample mean will be within one standard deviation of the expectation of the distribution?

Solution. As in the proof of the WLLN, by Chebyshev’s inequality,

	$\displaystyle\operatorname{\mathsf{P}}\left({\|\bar{X}_{n}-\mu\|>\sigma}\right)$	$\displaystyle\leq{\color[rgb]{0.76,0.01,0}\frac{1}{\sigma^{2}}\frac{\sigma^{2}% }{n}}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\frac{1}{n}.}$

We need ${\color[rgb]{0.76,0.01,0}\frac{1}{n}<1-0.99=0.01}$ . So ${\color[rgb]{0.76,0.01,0}n>100}$ is sufficient, whatever the distribution.

More general expectations:

Setting $V_{i}=g(X_{i})$ and applying the WLLN to $V_{1},V_{2},\dots$ shows us that $\frac{1}{n}\sum_{i=1}^{n}g(X_{i})\approx\operatorname{\mathsf{E}}\left[{g(X_{i% })}\right]$ , at least provided ${\operatorname{\mathsf{Var}}}\left[{g(X_{i})}\right]<\infty.$

Proportions converge to probabilities:

The WLLN states that averages, $\overline{X}_{n}$ , converge to the expectation, $\mu$ . However it also tells us that proportions converge to probabilities. Let $A\subseteq\mathbb{R}$ and let $I_{(X_{i}\in A)}$ indicate whether or not $X_{i}\in A$ ; $I$ is the indicator function – i.e. $I_{(X_{i}\in A)}=1$ if $X_{i}\in A$ and is $0$ otherwise.

The proportion of the $n$ rvs $X_{1},\dots,X_{n}$ that are in $A$ is

\displaystyle P_{n}=\frac{1}{n}\sum_{i=1}^{n}I_{(X_{i}\in A)},

so $\operatorname{\mathsf{E}}\left[{P_{n}}\right]=\operatorname{\mathsf{E}}\left[{% I_{(X_{1}\in A)}}\right]$ . But $I_{{X_{1}\in A}}\sim\operatorname{\mathsf{Bern}}(p)$ where $p=\operatorname{\mathsf{P}}\left({X_{1}\in A}\right)$ , so $\operatorname{\mathsf{E}}\left[{I_{(X_{1}\in A)}}\right]=p=\operatorname{% \mathsf{P}}\left({X_{1}\in A}\right)$ .

Now ${\operatorname{\mathsf{Var}}}\left[{I_{(X_{i}\in A)}}\right]<\infty$ as $0\leq I_{(X_{i}\in A)}\leq 1$ , so set $\tau^{2}={\operatorname{\mathsf{Var}}}\left[{I_{(X_{i}\in A)}}\right]$ . Then

\displaystyle\operatorname{\mathsf{P}}\left({|P_{n}-\operatorname{\mathsf{P}}% \left({X_{1}\in A}\right)|>\epsilon}\right)<\frac{\tau^{2}}{n\epsilon^{2}}\to 0,

as $n\to\infty$ .

In particular, with $A=(-\infty,x]$ , then we see that the proportion of realisations of $X$ that are $\leq x$ tends to $F_{X}(x)$ . As a function of $x$ this proportion is known as the empirical cdf.

With $A=(x,x+\delta]$ the proportion of realisations between $x$ and $x+\delta$ tends to $F_{X}(x+\delta)-F_{X}(x)\approx f_{X}(x)\delta$ . As a function of $x$ this proportion is known as the histogram.

Aside:

There is also a Strong Law of Large Numbers, proved by the outstanding Russian mathematician and probabilist Andrei Kolmogorov (1903-1987). It uses a stronger form of convergence (‘almost sure’). Informally, the Strong Law of Large Numbers says that for a sequence $X_{1},X_{2},\ldots$ of iid random variables, then $\operatorname{\mathsf{E}}\left[{|X|}\right]<\infty$ and $\operatorname{\mathsf{E}}\left[{X}\right]=\mu$ if and only if

\displaystyle\overline{X}_{n}{\to}\mu,\ n\to\infty,

for (almost) all realisations $x_{1},x_{2},\ldots$ of the sequence $X_{1},X_{2},\ldots$ (the probability of a ‘bad’ sequence is $0$ ). This is stronger than the WLLN since it concerns the joint distribution of the whole sequence of $\overline{X}_{n}$ s rather than the marginal distributions of each individual $\overline{X}_{n}$ . This is the result that justifies the Monte Carlo simulation approximations to calculate expectations by averages and probabilities by proportions.

It can be shown that if $X_{1},X_{2},\ldots$ is a sequence of iid random variables with $\operatorname{\mathsf{E}}\left[{|X_{1}|}\right]=+\infty$ then $\overline{X}_{n}$ will not converge to a fixed value for almost any realisation $x_{1},x_{2},\ldots$ of the sequence $X_{1},X_{2},\ldots$ .

Example 9.2.2.

If $X_{1},X_{2},\ldots$ are independent Cauchy random variables it can be shown that the mean $\overline{X}_{n}$ also has the same Cauchy distribution. Hence the distribution of $\overline{X}_{n}$ is the same for all $n$ and thus does not converge to a fixed value.

While the assumption of finite variance could be removed the assumption that the $X$ ’s have a finite expectation is necessary for the Law of Large Numbers to hold.