Figure 7.1 (First Link, Second Link) and Figure 7.2 (First Link, Second Link) show samples from four different joint distributions. In all cases the variables have the same marginal distribution for both and however the joint distributions have very different forms as they have different dependence structures. In this section we will try to characterise the dependence through a summary measure.
Throughout this section we use the notation
,
,
,
.
The most common way of describing the relationship between two random variables is through the covariance or correlation. The covariance between and is
So, just as , there are two equivalent forms for . Indeed, .
By symmetry, .
Moreover, we have the following, bilinearity properties:
since both are . By the symmetry property, .
since the left hand side is
which is the right hand side. Symmetry property .
For example, using property 1 twice, we have
Using property 2 twice and noticing that the covariance between a constant and a random variable is
The covariance occurs in the variance of sums of random variables. Consider
In particular, this means that
(7.1) |
Covariance has units = (units of )(units of ) and changes if we change the scale of either or we change the covariance between and .
The correlation, often denoted by , is
Correlation has the benefit of being invariant to location and scale changes, which aids interpretation,
provided , or . Quiz: Why not ,?
Indeed,
.
Without loss of generality suppose (if not then just divide each by its standard deviation, since that does not change the correlation).
As the variance must be positive, .
Now consider to find . ∎
The interpretation of the covariance/correlation between and is that if one variable tends to increase when the other does then both the covariance and the correlation will be positive, and the stronger the association between and the larger the value of the covariance and correlation, with corresponding to perfect positive linear association. If one variable tends to decrease when the other increases then both the covariance and the correlation will be negative, with corresponding to perfect negative linear association. Figure 7.1 (First Link, Second Link) and Figure 7.2 (First Link, Second Link) show four joint distributions with different correlations.
When and are independent we have so the covariance and correlation are both , i.e. .
The converse, however, is not true: does not imply that and are independent as Example 6.1.1 showed. With and , it was shown that , i.e. .
In general, we must be careful not to interpret too much into the value of these summary measures as both covariance and correlation measure linear association only and not association.