Home page for accesible maths 7 Linear transformations

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

7.1 Covariance and Correlation

Figure 7.1 (First Link, Second Link) and Figure 7.2 (First Link, Second Link) show samples from four different joint distributions. In all cases the variables have the same marginal distribution for both X and Y however the joint distributions have very different forms as they have different dependence structures. In this section we will try to characterise the dependence through a summary measure.

Figure 7.1: First Link, Second Link, Caption: Realisations from two different joint distributions of X and Y. The marginal distributions are the same in both cases.
Figure 7.2: First Link, Second Link, Caption: realisations from two more different joint distributions of X and Y. The marginal distributions are the same as in Figure Figure 7.1 (First Link, Second Link).

Throughout this section we use the notation

  1. 𝖤[X]=μX,

  2. 𝖤[Y]=μY,

  3. 𝖲𝗍𝖽𝖣𝖾𝗏[X]=σX,

  4. 𝖲𝗍𝖽𝖣𝖾𝗏[Y]=σY.

The most common way of describing the relationship between two random variables is through the covariance or correlation. The covariance between X and Y is

𝖢𝗈𝗏[X,Y] =𝖤[(X-μX)(Y-μY)]
=𝖤[XY-μXY-XμY+μXμY]
=𝖤[XY]-μX𝖤[Y]-𝖤[X]μY+μXμY
=𝖤[XY]-𝖤[X]𝖤[Y]
=𝖤[XY]-μXμY.

So, just as 𝖵𝖺𝗋[X]=𝖤[(X-μX)2]=𝖤[X2]-𝖤[X]2, there are two equivalent forms for 𝖢𝗈𝗏[X,Y]. Indeed, 𝖢𝗈𝗏[X,X]=𝖵𝖺𝗋[X].

By symmetry, 𝖢𝗈𝗏[X,Y]=𝖢𝗈𝗏[Y,X].

Moreover, we have the following, bilinearity properties:

  1. 1.

    𝖢𝗈𝗏[aX,Y]=a𝖢𝗈𝗏[X,Y] since both are 𝖤[aXY]-𝖤[aX]𝖤[Y]. By the symmetry property, 𝖢𝗈𝗏[X,cY]=c𝖢𝗈𝗏[X,Y].

  2. 2.

    𝖢𝗈𝗏[W+X,Y]=𝖢𝗈𝗏[W,Y]+𝖢𝗈𝗏[X,Y] since the left hand side is

    𝖤[(W+X)Y]-𝖤[W+X]𝖤[Y]=𝖤[WY]+𝖤[XY]-𝖤[W]𝖤[Y]-𝖤[X]𝖤[Y]

    which is the right hand side. Symmetry property 𝖢𝗈𝗏[X,Y+Z]=𝖢𝗈𝗏[X,Y]+𝖢𝗈𝗏[X,Z].

For example, using property 1 twice, we have

𝖢𝗈𝗏[aX,cY]=a𝖢𝗈𝗏[X,cY]=ac𝖢𝗈𝗏[X,Y].

Using property 2 twice and noticing that the covariance between a constant and a random variable is 0

𝖢𝗈𝗏[X+b,Y+c]=𝖢𝗈𝗏[X,Y].

The covariance occurs in the variance of sums of random variables. Consider

𝖵𝖺𝗋[X+Y] =𝖤[(X+Y)2]-𝖤[X+Y]2
=𝖤[X2+2XY+Y2]-(𝖤[X]+𝖤[Y])2
=𝖤[X2]+𝖤[2XY]+𝖤[Y2]-(𝖤[X]2+2𝖤[X]𝖤[Y]+𝖤[Y]2)
=𝖤[X2]-𝖤[X]2+𝖤[Y2]-𝖤[Y]2+2(𝖤[XY]-𝖤[X]𝖤[Y])
=𝖵𝖺𝗋[X]+𝖵𝖺𝗋[Y]+2𝖢𝗈𝗏[X,Y].

In particular, this means that

𝖵𝖺𝗋[aX+bY]=𝖵𝖺𝗋[aX]+𝖵𝖺𝗋[bY]+2𝖢𝗈𝗏[aX,bY]=a2𝖵𝖺𝗋[X]+b2𝖵𝖺𝗋[Y]+2ab𝖢𝗈𝗏[X,Y]. (7.1)

Covariance has units = (units of X)×(units of Y) and changes if we change the scale of either X or Y we change the covariance between X and Y.

The correlation, often denoted by ρ, is

ρ=𝖢𝗈𝗋𝗋[X,Y]=𝖢𝗈𝗏[X,Y]𝖵𝖺𝗋[X]𝖵𝖺𝗋[Y]=𝖢𝗈𝗏[X,Y]σXσY.

Correlation has the benefit of being invariant to location and scale changes, which aids interpretation,

𝖢𝗈𝗋𝗋[aX+b,cY+d]=𝖢𝗈𝗋𝗋[X,Y],

provided a>0, c>0 or a<0,c<0. Quiz: Why not a>0,c<0?

𝖢𝗈𝗋𝗋[aX,cY]=𝖢𝗈𝗏[aX,cY]σaXσcY=ac𝖢𝗈𝗏[X,Y]|a||c|σXσY=sign(ac)ρXY.

Indeed,

𝖢𝗈𝗋𝗋[aX+b,cY+d]=sgn(ac)𝖢𝗈𝗋𝗋[X,Y].
Proposition 7.1.1.

-1ρ1.

Proof.

Without loss of generality suppose 𝖵𝖺𝗋[X]=1=𝖵𝖺𝗋[Y] (if not then just divide each by its standard deviation, since that does not change the correlation).

𝖵𝖺𝗋[X+Y]=𝖵𝖺𝗋[X]+𝖵𝖺𝗋[Y]+2𝖢𝗈𝗏[X,Y]=2+2𝖢𝗈𝗋𝗋[X,Y].

As the variance must be positive, -1𝖢𝗈𝗋𝗋[X,Y].

Now consider 𝖵𝖺𝗋[X-Y] to find 𝖢𝗈𝗋𝗋[X,Y]1. ∎

The interpretation of the covariance/correlation between X and Y is that if one variable tends to increase when the other does then both the covariance and the correlation will be positive, and the stronger the association between X and Y the larger the value of the covariance and correlation, with ρ=1 corresponding to perfect positive linear association. If one variable tends to decrease when the other increases then both the covariance and the correlation will be negative, with ρ=-1 corresponding to perfect negative linear association. Figure 7.1 (First Link, Second Link) and Figure 7.2 (First Link, Second Link) show four joint distributions with different correlations.

When X and Y are independent we have 𝖤[XY]=𝖤[X]𝖤[Y] so the covariance and correlation are both 0, i.e. ρ=0.

The converse, however, is not true: ρ=0 does not imply that X and Y are independent as Example 6.1.1 showed. With XN(0,1) and Y=X2-1, it was shown that 𝖤[XY]-𝖤[X]𝖤[Y]=0, i.e. ρ=𝖢𝗈𝗏[XY]=0.

In general, we must be careful not to interpret too much into the value of these summary measures as both covariance and correlation measure linear association only and not association.