Home page for accesible maths MATH103: Probability8.3 Weak law of large numbers

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

8.3 Weak law of large numbers

Recall from Exercise 5.13 that if an experiment is repeated $n$ times then, as $n$ gets large, the proportion of times an event $A$ occurs converges to ${\rm P}(A)$ . We will now prove a similar result concerning the average of several realisations of a random variable converging to the expected value. We start with a lemma which is proved in MATH230.

Lemma 8.9.

Let $X_{1},X_{2},\ldots,X_{n}$ be jointly distributed random variables with finite expectation and variance. Then

•

${\rm E}(X_{1}+X_{2}+\cdots+X_{n})={\rm E}(X_{1})+{\rm E}(X_{2})+\cdots+{\rm E}% (X_{n})$ , and
•

if $X_{1},X_{2},\ldots,X_{n}$ are independent then

${\rm Var}(X_{1}+X_{2}+\cdots+X_{n})={\rm Var}(X_{1})+{\rm Var}(X_{2})+\cdots+{% \rm Var}(X_{n}).$

Now suppose that $X_{1},X_{2},\ldots,X_{n}$ are independent copies of a random variable $X$ . For example, suppose we repeated an experiment $n$ times, and $X_{i}$ is the measured outcome on the $i$ th experiment. This setup means that for each $i$ we have

	$\displaystyle{\rm E}[X_{i}]$	$\displaystyle=$	$\displaystyle{\color[rgb]{1,1,1}{\rm E}[X]}$
	$\displaystyle{\rm Var}(X_{i})$	$\displaystyle=$	$\displaystyle{\color[rgb]{1,1,1}{\rm Var}(X)}$

If we want to report a value, scientists will usually measure it $n$ times and report the average measured value. Let $X_{i}$ be the measured value on the $i$ th experiment. The average measured value is

\bar{X}=\frac{1}{n}(X_{1}+X_{2}+\cdots+X_{n}).

Why do we do this?

Let’s consider the properties of $\bar{X}$ . For simplicity, write $\mu$ for ${\rm E}[X]$ and $\sigma^{2}$ for ${\rm Var}(X)$ .

$\displaystyle{\rm E}[\bar{X}]$	$\displaystyle=$	$\displaystyle{\rm E}\left[\frac{1}{n}(X_{1}+X_{2}+\cdots+X_{n})\right]$

	$\displaystyle=$	$\displaystyle\frac{1}{n}{\rm E}[X_{1}+X_{2}+\cdots+X_{n}]\qquad\mbox{by % linearity of ${\rm E}$}$

	$\displaystyle=$	$\displaystyle\frac{1}{n}\left\{{\rm E}[X_{1}]+{\rm E}[X_{2}]+\cdots+{\rm E}[X_% {n}]\right\}\qquad\mbox{by Lemma \ref{LemLinOfEVar}}$

	$\displaystyle=$	$\displaystyle\frac{1}{n}\left\{{\rm E}[X]+{\rm E}[X]+\cdots+{\rm E}[X]\right\}% \qquad\mbox{since}\quad{\rm E}[X_{i}]={\rm E}[X]$

	$\displaystyle=$	$\displaystyle\frac{1}{n}\{n\mu\}$

	$\displaystyle=$	$\displaystyle\mu$

So $\bar{X}$ has expectation the quantity we wish to report, the true expected value of $X$ . Of course, simply reporting the first measurement $X_{1}$ would also have this expected value.

Consider now the variance of $\bar{X}$ :

$\displaystyle{\rm Var}(\bar{X})$	$\displaystyle=$	$\displaystyle{\rm Var}\left(\frac{1}{n}(X_{1}+X_{2}+\cdots+X_{n})\right)$

	$\displaystyle=$	$\displaystyle\frac{1}{n^{2}}{\rm Var}(X_{1}+X_{2}+\cdots+X_{n})\qquad\mbox{by % the calculation on p\pageref{VarCX}}$

	$\displaystyle=$	$\displaystyle\frac{1}{n^{2}}\left\{{\rm Var}(X_{1})+{\rm Var}(X_{2})+\cdots+{% \rm Var}(X_{n})\right\}\qquad\mbox{by Lemma \ref{LemLinOfEVar}}$

	$\displaystyle=$	$\displaystyle\frac{1}{n^{2}}\{n\sigma^{2}\}$

	$\displaystyle=$	$\displaystyle\frac{\sigma^{2}}{n}$

The variance of our reported quantity, $\bar{X}$ , decreases as the number of measurements $n$ increases.

We can use Chebychev’s inequality (Section 4.6) to be more precise about this. Recall that for any random variable $R$ with expected value $\mu$ and standard deviation $s$

{\rm P}\left(|R-\mu|>cs\right)\leq\frac{1}{c^{2}},

for any $c>0$ .

[I am using $s$ for the standard deviation here, instead of $\sigma$ , to avoid confusion with the $\sigma^{2}$ already used for the variance of $X$ .]

Hence for the random variable $\bar{X}$ with expected value $\mu$ , variance $\sigma^{2}/n$ and hence standard deviation $\sigma/\sqrt{n}$ , we have

{\rm P}(|\bar{X}-\mu|>\frac{c\sigma}{\sqrt{n}})\leq\frac{1}{c^{2}}

By taking $k=c/\sqrt{n}$ , we can rearrange this expression to

{\rm P}(|\bar{X}-\mu|>k\sigma)\leq\frac{1}{k^{2}n}.

We see that as $n$ gets large, the probability that the sample average $\bar{X}$ is more than distance $k\sigma$ away from the expected value of the original random quantity $X$ decreases to 0.

Since $k$ is arbitrary, in some sense we can say that $\bar{X}$ converges to $\mu$ . This is called the weak law of large numbers. You will see various other forms of convergence of random variables in later courses.

One final thing to note: the standard deviation $\sigma$ is exactly the right quantity for determining the appropriate scale for measuring distance here: the events are of the type “random variable is more than $k$ standard deviations away from the mean”.