A serious limitation of classical probability is that it only applies in situations where the sample space is finite and all outcomes are equiprobable. While this might be useful for drawing cards, rolling dice, or pulling balls from urns, it offers no method for dealing with outcomes with unequal probabilities, or where the sample space may be infinite.
The frequency or empirical approach to probability is based on the idea that the underlying probability of an event can be measured by repeated trials. Supposing that is an event for some experiment, then if you repeat the experiment a number of times, , we might hope that the proportion of trials in which occurs tends to stabilise as . We would like to call this Prob(). More precisely
where is the number of times event occurs after trials.
If you toss a coin 1000 times and get heads 200 times that suggests the coin is biased and 1/5.
Suppose a survey asks 500 people how they will vote in the next election and 150 say they support Labour. If is the event that a given person supports Labour, then 3/10.
In both cases, increasing the number of trials will improve the approximation.
What can we say about:
Prob() in general?
, so
Prob()?
, so Prob.
Prob()?
, so Prob.
Furthermore, if and are exclusive events, and , set
to be the number of times occurs in the first trials
to be the number of times occurs in the first trials
to be the number of times occurs in the first trials
Then since and are exclusive. Therefore
Taking the limit as we see that
But since , we have
for the exclusive events and .
However, how can we know if this thinking is valid? It seems intuitively reasonable, but we can’t be sure. In particular, it is impossible to conduct an infinite number of trials, and it is unclear how large must be to give a good approximation. More seriously, may not converge at all, or even if it does, if we repeat the experiment again, we may not necessarily obtain the same limit.
The modern theory of probability works the other way round: we assume that for each event there exists a number called the probability of , and place axioms on the function . We will see that these axioms imply the convergence we hope to see.