Independence is the simplest form for joint behaviour of two (or more) random variables. Informally, two random variables and are independent if knowing the value of one of them gives no information about the value of the other.
The outcomes of, say, rolls of two separate dice are independent in exactly this sense: knowing that the red die showed a 4 does not give us any information about the score of the blue die, and, conversely, knowing that the score of the blue die was does not give any information about the red die.
Two random variables and are independent if the events and are independent for all sets and , i.e. for all sets .
Two discrete random variables and are independent if and only if
for all and .
Let and be independent, and let and . Then
Conversely, if the joint pmf factorises we get for arbitrary sets and
∎
If and are discrete random variables, the conditional pmfs are
Thus
Show that if the discrete variables are independent then for all :
These results conform with intuition as, when and are independent, knowing the value of should tell us nothing about .
The converse is also true: if the conditional distribution of given is independent of or, equivalently, the conditional distribution of given is independent of , then and are independent.
A fair coin is tossed. If it shows a fair die is thrown, if a biased die. The bias makes even numbers twice as probable as odd numbers. Find the joint pmf of , the toss of the coin, and , the score on the die.
Code and as and to make rvs. Marginal: for .
Conditional:
: for .
: for
and for .
Hence .
Using delivers
1 | 2 | 3 | 4 | 5 | 6 | |
---|---|---|---|---|---|---|
0 | 1/18 | 2/18 | 1/18 | 2/18 | 1/18 | 2/18 |
1 | 1/12 | 1/12 | 1/12 | 1/12 | 1/12 | 1/12 |
For the joint pmf in Example 8.3 obtain the conditional pmf of given .
0 | 1 | 2 | 3 | |||
---|---|---|---|---|---|---|
1 | 2/60 | 16/60 | ||||
2 | 3/60 | 24/60 | ||||
3 | 6/60 | 20/60 | ||||
11/60 |
So
Thus w.p. , w.p. and w.p. .
We have seen that when and are both discrete, they are independent if and only if their joint pmf can be factorised as a product of the marginal pmfs.
Our definition of independence holds also for continuous random variables, but there is no joint pmf for continuous random variables. It is beyond the scope of this module, but there can exist a joint probability density function . As with univariate random variables, results that hold in the discrete case with probability mass functions often hold in the continuous case with joint mass functions.
Two continuous random variables and are independent if and only if
Not given here. ∎
The result is needed for constructing likelihood-based estimates in statistics: often it is assumed that repeated experiments result in independent observations of a random variable, and the joint density function of the observations is the product of the marginal densities.
Recall from Exercise 5.13 that if an experiment is repeated times then, as gets large, the proportion of times an event occurs converges to . We will now prove a similar result concerning the average of several realisations of a random variable converging to the expected value. We start with a lemma which is proved in MATH230.
Let be jointly distributed random variables with finite expectation and variance. Then
, and
if are independent then
Now suppose that are independent copies of a random variable . For example, suppose we repeated an experiment times, and is the measured outcome on the th experiment. This setup means that for each we have
If we want to report a value, scientists will usually measure it times and report the average measured value. Let be the measured value on the th experiment. The average measured value is
Why do we do this?
Let’s consider the properties of . For simplicity, write for and for .
So has expectation the quantity we wish to report, the true expected value of . Of course, simply reporting the first measurement would also have this expected value.
Consider now the variance of :
The variance of our reported quantity, , decreases as the number of measurements increases.
We can use Chebychev’s inequality (Section 4.6) to be more precise about this. Recall that for any random variable with expected value and standard deviation
for any .
[I am using for the standard deviation here, instead of , to avoid confusion with the already used for the variance of .]
Hence for the random variable with expected value , variance and hence standard deviation , we have
By taking , we can rearrange this expression to
We see that as gets large, the probability that the sample average is more than distance away from the expected value of the original random quantity decreases to 0.
Since is arbitrary, in some sense we can say that converges to . This is called the weak law of large numbers. You will see various other forms of convergence of random variables in later courses.
One final thing to note: the standard deviation is exactly the right quantity for determining the appropriate scale for measuring distance here: the events are of the type “random variable is more than standard deviations away from the mean”.