We would like to ascertain when averages of a sequence of IID random variables converge in probability to the expectation, , of each random variable in the sequence. We need will need Chebyshev’s inequality, which we derive from a simpler, but more general result, Markov’s inequality.
(Markov’s Inequality) If is a non-negative random variable then for any ,
We will assume that is a continuous random variable; an almost identical proof holds for a discrete rv, with sums instead of integrals.
Let have density function , then for any ,
∎
Quiz: Why do the inequalities hold? , ; .
For a random variable with , setting and gives , i.e.
(Chebyshev’s Inequality) If is a random variable with expectation and variance then for any ,
The Weak Law of Large Numbers. Suppose is a sequence of i.i.d. random variables with expectation and finite variance then for any ,
as .
Using Chebyshev’s inequality we have for any
as . ∎
The theorem says what we could observe in Figure 9.1 (Link) namely that the distribution of concentrates more and more around , in the sense that no matter how small an interval we take around the probability of falling in this interval tends to .
Note: The requirement that is not necessary. For example it is sufficient that (but the proof is harder).
How large a random sample should be taken from a distribution in order for the probability to be at least 0.99 that the sample mean will be within one standard deviation of the expectation of the distribution?
Solution. As in the proof of the WLLN, by Chebyshev’s inequality,
We need . So is sufficient, whatever the distribution.
Setting and applying the WLLN to shows us that , at least provided
The WLLN states that averages, , converge to the expectation, . However it also tells us that proportions converge to probabilities. Let and let indicate whether or not ; is the indicator function – i.e. if and is otherwise.
The proportion of the rvs that are in is
so . But where , so .
Now as , so set . Then
as .
In particular, with , then we see that the proportion of realisations of that are tends to . As a function of this proportion is known as the empirical cdf.
With the proportion of realisations between and tends to . As a function of this proportion is known as the histogram.
There is also a Strong Law of Large Numbers, proved by the outstanding Russian mathematician and probabilist Andrei Kolmogorov (1903-1987). It uses a stronger form of convergence (‘almost sure’). Informally, the Strong Law of Large Numbers says that for a sequence of iid random variables, then and if and only if
for (almost) all realisations of the sequence (the probability of a ‘bad’ sequence is ). This is stronger than the WLLN since it concerns the joint distribution of the whole sequence of s rather than the marginal distributions of each individual . This is the result that justifies the Monte Carlo simulation approximations to calculate expectations by averages and probabilities by proportions.
It can be shown that if is a sequence of iid random variables with then will not converge to a fixed value for almost any realisation of the sequence .
If are independent Cauchy random variables it can be shown that the mean also has the same Cauchy distribution. Hence the distribution of is the same for all and thus does not converge to a fixed value.
While the assumption of finite variance could be removed the assumption that the ’s have a finite expectation is necessary for the Law of Large Numbers to hold.