Recall from Exercise 5.13 that if an experiment is repeated times then, as gets large, the proportion of times an event occurs converges to . We will now prove a similar result concerning the average of several realisations of a random variable converging to the expected value. We start with a lemma which is proved in MATH230.
Let be jointly distributed random variables with finite expectation and variance. Then
, and
if are independent then
Now suppose that are independent copies of a random variable . For example, suppose we repeated an experiment times, and is the measured outcome on the th experiment. This setup means that for each we have
If we want to report a value, scientists will usually measure it times and report the average measured value. Let be the measured value on the th experiment. The average measured value is
Why do we do this?
Let’s consider the properties of . For simplicity, write for and for .
So has expectation the quantity we wish to report, the true expected value of . Of course, simply reporting the first measurement would also have this expected value.
Consider now the variance of :
The variance of our reported quantity, , decreases as the number of measurements increases.
We can use Chebychev’s inequality (Section 4.6) to be more precise about this. Recall that for any random variable with expected value and standard deviation
for any .
[I am using for the standard deviation here, instead of , to avoid confusion with the already used for the variance of .]
Hence for the random variable with expected value , variance and hence standard deviation , we have
By taking , we can rearrange this expression to
We see that as gets large, the probability that the sample average is more than distance away from the expected value of the original random quantity decreases to 0.
Since is arbitrary, in some sense we can say that converges to . This is called the weak law of large numbers. You will see various other forms of convergence of random variables in later courses.
One final thing to note: the standard deviation is exactly the right quantity for determining the appropriate scale for measuring distance here: the events are of the type “random variable is more than standard deviations away from the mean”.