Home page for accesible maths 10 Multivariate Normal Distributions 10 Multivariate Normal Distributions 10.1.1 Conditional Distributions

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

10.1 The Bivariate Normal Distribution

Two continuous random variables $X$ and $Y$ are said to have a bivariate normal distribution with parameters $\boldsymbol{\theta}=(\mu_{X},\mu_{Y},\sigma^{2}_{X},\sigma^{2}_{Y},\rho)$ , where $\sigma^{2}_{X}>0$ , $\sigma^{2}_{Y}>0$ and $-1<\rho<1$ , if their joint pdf is given for all $x$ and $y$ by

\displaystyle f_{XY}(x,y;\boldsymbol{\theta})=\frac{1}{2\pi\sqrt{\sigma_{X}^{2% }\sigma_{Y}^{2}(1-\rho^{2})}}\times\exp\left\{-\frac{1}{2}\frac{1}{1-\rho^{2}}% Q(x,y)\right\},

where

\displaystyle Q(x,y)=\left(\frac{x-\mu_{X}}{\sigma_{X}}\right)^{2}-2\rho\left(% \frac{x-\mu_{X}}{\sigma_{X}}\right)\left(\frac{y-\mu_{Y}}{\sigma_{Y}}\right)+% \left(\frac{y-\mu_{Y}}{\sigma_{Y}}\right)^{2}.

It will be shown below that the marginal distributions of $X$ and $Y$ are both normal with $X\sim N(\mu_{X},\sigma_{X}^{2})$ , $Y\sim N(\mu_{Y},\sigma_{Y}^{2})$ and ${\operatorname{\mathsf{Corr}}}\left[{X,Y}\right]=\rho$ . This explains the notation and the restrictions on the parameters.

In the special case of $\rho=0$ , i.e. no correlation, the joint pdf factorises into:

\displaystyle f_{XY}(x,y;\boldsymbol{\theta})=\frac{1}{\sqrt{2\pi\sigma_{X}^{2% }}}\exp\left\{-\frac{1}{2}\left(\frac{x-\mu_{X}}{\sigma_{X}}\right)^{2}\right% \}\times\frac{1}{\sqrt{2\pi\sigma_{Y}^{2}}}\exp\left\{-\frac{1}{2}\left(\frac{% y-\mu_{Y}}{\sigma_{Y}}\right)^{2}\right\},

and so $(X,Y)$ are independent. That is, for bivariate normal random variables, no correlation implies independence. Recall from Section 7.1 that this is NOT true in general.

Realisations from this bivariate distribution and contour plots of the corresponding pdfs are shown on Figure 10.1 (First Link, Second Link), Figure 10.2 (First Link, Second Link) and Figure 10.3 (First Link, Second Link) for $\mu_{X}=\mu_{Y}=0$ and $\sigma_{X}^{2}=\sigma_{Y}^{2}=1$ and varying values of $\rho$ . Note that the contours of the pdf are ellipses centred at the origin with orientation given by $\rho$ . In general the contours will be ellipses centred at $(\mu_{X},\mu_{Y})$ .

Figure 10.1: First Link, Second Link, Caption: Left: 1000 realisations of a bivariate normal distribution with

\rho=0

. The marginal distribution is standard normal. Right: Contour plots of the corresponding pdf.

Figure 10.2: First Link, Second Link, Caption: Left: 1000 realisations of a bivariate normal distribution with

\rho=0.8

. The marginal distribution is standard normal. Right: Contour plots of the corresponding pdf.

Figure 10.3: First Link, Second Link, Caption: Left: 1000 realisations of a bivariate normal distribution with

\rho=-0.6

. The marginal distribution is standard normal. Right: Contour plots of the corresponding pdf.

The pdf is more conveniently expressed in matrix notation, which also makes the analogy with the univariate case clearer and the extension to higher dimensions easier. Set

\displaystyle\Sigma=\begin{bmatrix}\sigma_{X}^{2}&\rho\sigma_{X}\sigma_{Y}\\ \rho\sigma_{X}\sigma_{Y}&\sigma_{Y}^{2}\end{bmatrix}

so that

\displaystyle\Sigma^{-1}=\frac{1}{\sigma^{2}_{X}\sigma^{2}_{Y}(1-\rho^{2})}% \begin{bmatrix}\sigma_{Y}^{2}&-\rho\sigma_{X}\sigma_{Y}\\ -\rho\sigma_{X}\sigma_{Y}&\sigma_{X}^{2}\end{bmatrix}=\frac{1}{1-\rho^{2}}% \begin{bmatrix}\frac{1}{\sigma_{X}^{2}}&\frac{-\rho}{\sigma_{X}\sigma_{Y}}\\ \frac{-\rho}{\sigma_{X}\sigma_{Y}}&\frac{1}{\sigma_{Y}^{2}}\end{bmatrix}.

Then

\displaystyle f_{XY}(x,y;\boldsymbol{\theta})=\frac{1}{2\pi\sqrt{\det\Sigma}}% \exp\left\{-\frac{1}{2}\begin{bmatrix}x-\mu_{X}&y-\mu_{Y}\end{bmatrix}\Sigma^{% -1}\begin{bmatrix}x-\mu_{X}\\ y-\mu_{Y}\end{bmatrix}\right\},

Generating the bivariate normal To derive properties of the bivariate normal distribution it is often a good idea to think of it as a transformation of two independent standard normal random variables $U$ and $V$ , say.

Now consider the linear transformation

\displaystyle\begin{bmatrix}X\\ Y\end{bmatrix}=\begin{bmatrix}\mu_{X}\\ \mu_{Y}\end{bmatrix}+A\begin{bmatrix}U\\ V\end{bmatrix},

(10.1)

where

\displaystyle A=\begin{bmatrix}\sigma_{X}&0\\ \rho\sigma_{X}&\sqrt{1-\rho^{2}}\sigma_{Y}\end{bmatrix}.

Or equivalently

\displaystyle X=\mu_{X}+\sigma_{X}U

(10.2)

\displaystyle Y=\mu_{Y}+\rho\sigma_{Y}U+\left(\sqrt{1-\rho^{2}}\right)\sigma_{% Y}V.

(10.3)

Clearly

$\operatorname{\mathsf{E}}\left[{X}\right]=\mu_{X}+\sigma_{X}\operatorname{% \mathsf{E}}\left[{U}\right]=\mu_{X}$ ,
$\operatorname{\mathsf{E}}\left[{Y}\right]=\mu_{Y}+\rho\sigma_{Y}\operatorname{% \mathsf{E}}\left[{U}\right]+\left(\sqrt{1-\rho^{2}}\right)\sigma_{Y}% \operatorname{\mathsf{E}}\left[{V}\right]=\mu_{Y}$ .

Using the independence of $U$ and $V$ , the variances are

${\operatorname{\mathsf{Var}}}\left[{X}\right]=\sigma_{X}^{2}{\operatorname{% \mathsf{Var}}}\left[{U}\right]=\sigma^{2}_{X}$ ,
${\operatorname{\mathsf{Var}}}\left[{Y}\right]=\rho^{2}\sigma^{2}_{Y}{% \operatorname{\mathsf{Var}}}\left[{U}\right]+\left(1-\rho^{2}\right)\sigma^{2}% _{Y}{\operatorname{\mathsf{Var}}}\left[{V}\right]=\sigma^{2}_{Y}$ .

Finally, the covariance and correlation are (again using independence of $U$ and $V$ )

	$\displaystyle{\operatorname{\mathsf{Cov}}}\left[{X,Y}\right]$	$\displaystyle={\operatorname{\mathsf{Cov}}}\left[{\sigma_{X}U,\rho\sigma_{Y}U}% \right]+{\operatorname{\mathsf{Cov}}}\left[{\sigma_{X}U,\sqrt{1-\rho^{2}}V}\right]$
		$\displaystyle=\rho\sigma_{X}\sigma_{Y}{\operatorname{\mathsf{Var}}}\left[{U}% \right]=\rho\sigma_{X}\sigma_{Y}$

and

\displaystyle{\operatorname{\mathsf{Corr}}}\left[{X,Y}\right]=\rho.

We have shown that the parameters have the intuitive interpretation. We also need to show that $(X,Y)$ have the correct joint distribution. We do this for a general 1-1 linear transformation of a vector of $d$ iid $\operatorname{\mathsf{N}}(0,1)$ rvs (i.e. not just for $2$ ).

Proposition 10.1.1.

Let $\boldsymbol{\mu}$ be a $d\times 1$ vector, $A$ an invertible $d\times d$ matrix and $U_{1},\dots,U_{d}$ iid $N(0,1)$ rvs with $\boldsymbol{U}=(U_{1},\dots,U_{d})^{\prime}$ . Then $\boldsymbol{X}=\boldsymbol{\mu}+A\boldsymbol{U}$ has density

\displaystyle f_{\boldsymbol{X}}(\boldsymbol{x})=\frac{1}{\left(2\pi\right)^{d% /2}\left(\det\Sigma\right)^{1/2}}\exp\left[-\frac{1}{2}\left(\boldsymbol{x}-% \boldsymbol{\mu}\right)^{\prime}\Sigma^{-1}\left(\boldsymbol{x}-\boldsymbol{% \mu}\right)\right],

(10.4)

where $\Sigma=AA^{\prime}$ .

Proof.

By independence, the joint density of $\boldsymbol{U}$ is

	$\displaystyle f_{\boldsymbol{U}}(\boldsymbol{u})$	$\displaystyle=\prod_{i=1}^{d}\frac{1}{\sqrt{2\pi}}\exp\left[-\frac{1}{2}u_{i}^% {2}\right]=\frac{1}{\left(2\pi\right)^{d/2}}\exp\left[-\frac{1}{2}\sum_{i=1}^{% d}u_{i}^{2}\right]$
		$\displaystyle=\frac{1}{\left(2\pi\right)^{d/2}}\exp\left[-\frac{1}{2}% \boldsymbol{u}^{\prime}\boldsymbol{u}\right]$

Since $A$ is invertible, the transformation $\boldsymbol{U}\rightarrow\boldsymbol{X}$ is one-to-one and we may use the density method. Now

\displaystyle X_{i}=\mu_{i}+\sum_{j=1}^{d}A_{ij}U_{j}

\displaystyle\frac{\partial X_{i}}{\partial U_{j}}=A_{ij}.

Hence

\displaystyle\left|\det\frac{\partial\boldsymbol{U}}{\partial\boldsymbol{X}}% \right|=\left|\det\frac{\partial\boldsymbol{X}}{\partial\boldsymbol{U}}\right|% ^{-1}=|\det A|^{-1}.

Thus

	$\displaystyle f_{\boldsymbol{X}}(\boldsymbol{x})$	$\displaystyle=\frac{1}{\left(2\pi\right)^{d/2}\|\det A\|}\exp\left[-\frac{1}{2}% \left(A^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right)^{\prime}A^{-1}(% \boldsymbol{x}-\boldsymbol{\mu})\right]$
		$\displaystyle=\frac{1}{\left(2\pi\right)^{d/2}\|\det A\|}\exp\left[-\frac{1}{2}(% \boldsymbol{x}-\boldsymbol{\mu})^{\prime}\left(A^{-1}\right)^{\prime}A^{-1}(% \boldsymbol{x}-\boldsymbol{\mu})\right]$
		$\displaystyle=\frac{1}{\left(2\pi\right)^{d/2}\|\det A\|}\exp\left[-\frac{1}{2}(% \boldsymbol{x}-\boldsymbol{\mu})^{\prime}\left(AA^{\prime}\right)^{-1}(% \boldsymbol{x}-\boldsymbol{\mu})\right].$

The result follows since $\Sigma=AA^{\prime}$ and $\det\Sigma=\det A\det A^{\prime}=\left(\det A\right)^{2}$ .

In our case with $A$ as in (10.1),

\displaystyle\Sigma=\begin{bmatrix}\sigma_{X}&0\\ \rho\sigma_{Y}&\sqrt{(1-\rho^{2})\sigma_{Y}^{2}}\end{bmatrix}\begin{bmatrix}% \sigma_{X}&\rho\sigma_{Y}\\ 0&\sqrt{(1-\rho^{2})\sigma_{Y}^{2}}\end{bmatrix}=\begin{bmatrix}\sigma_{X}^{2}% &\rho\sigma_{X}\sigma_{Y}\\ \rho\sigma_{X}\sigma_{Y}&\sigma_{Y}^{2}\end{bmatrix},

(10.5)

as claimed. ∎

Marginal distributions

This shows that the marginal distributions of $X$ and $Y$ are both normal since $X$ and $Y$ are both linear combinations of the independent normal random variables $U$ and $V$ (the convolution property).

Linear transformations

Suppose $X$ and $Y$ are bivariate Normal random variables. Consider the linear transformation

\displaystyle\begin{bmatrix}S\\ T\end{bmatrix}=\begin{bmatrix}c_{1}\\ c_{2}\end{bmatrix}+B\begin{bmatrix}X\\ Y\end{bmatrix},

and assume this is one-to-one i.e. $\det(B)\neq 0$ . Then

\displaystyle\begin{bmatrix}S\\ T\end{bmatrix}=\begin{bmatrix}c_{1}\\ c_{2}\end{bmatrix}+B\left(\begin{bmatrix}\mu_{X}\\ \mu_{Y}\end{bmatrix}+A\begin{bmatrix}U\\ V\end{bmatrix}\right).

But this can be written as

\displaystyle\begin{bmatrix}S\\ T\end{bmatrix}=\boldsymbol{\mu}_{*}+A_{*}\begin{bmatrix}U\\ V\end{bmatrix},

where

\displaystyle\boldsymbol{\mu}_{*}=\begin{bmatrix}c_{1}\\ c_{2}\end{bmatrix}+B\begin{bmatrix}\mu_{X}\\ \mu_{Y}\end{bmatrix}

and $A_{*}=BA$ .

Since $\det(A)\neq 0$ and $\det(B)\neq 0$ , $\det(AB)\neq 0$ and so $(S,T)$ also have a bivariate normal distribution. i.e. one-to-one linear transformations of bivariate normal random variables are also bivariate normal. Since $U$ and $V$ are independent and follow a $\operatorname{\mathsf{N}}(0,1)$ distribution, the expectation and variance are $\boldsymbol{\mu}_{*}$ and $A_{*}A_{*}^{\prime}$ .

Example 10.1.1.

The joint distribution of $(X,Y)^{\prime}$ is bivariate Normal with expectation ${\begin{bmatrix}1&2\end{bmatrix}}^{\prime}$ and variance

\displaystyle\begin{bmatrix}1&2\\ 2&4\end{bmatrix}.

Find the distribution of $T=aX+bY$ .

Solution. As linear combinations of MVN variables are normally distributed, we just have to find the expectation and variance.

	$\displaystyle\operatorname{\mathsf{E}}\left[{T}\right]$	$\displaystyle={\color[rgb]{0.76,0.01,0}\begin{bmatrix}a&b\end{bmatrix}% \operatorname{\mathsf{E}}\left[{\begin{bmatrix}{\color[rgb]{0.76,0.01,0}X}\\ {\color[rgb]{0.76,0.01,0}Y}\end{bmatrix}}\right]}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\begin{bmatrix}{\color[rgb]{0.76,0.01,0% }a}&{\color[rgb]{0.76,0.01,0}b}\end{bmatrix}\begin{bmatrix}{\color[rgb]{% 0.76,0.01,0}1}\\ {\color[rgb]{0.76,0.01,0}2}\end{bmatrix}=a+2b,}$

	$\displaystyle{\operatorname{\mathsf{Var}}}\left[{T}\right]$	$\displaystyle=\begin{bmatrix}a&b\end{bmatrix}\begin{bmatrix}1&2\\ 2&4\end{bmatrix}\begin{bmatrix}a\\ b\end{bmatrix}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}a^{2}+4ab+4b^{2}=(a+2b)^{2}.}$

Hence ${\color[rgb]{0.76,0.01,0}T\sim\operatorname{\mathsf{N}}(a+2b,(a+2b)^{2})}$ .

10.1.1 Conditional Distributions

	$\displaystyle f_{\boldsymbol{X}}(\boldsymbol{x})$	$\displaystyle=\frac{1}{\left(2\pi\right)^{d/2}\|\det A\|}\exp\left[-\frac{1}{2}% \left(A^{-1}(\boldsymbol{x}-\boldsymbol{\mu})\right)^{\prime}A^{-1}(% \boldsymbol{x}-\boldsymbol{\mu})\right]$
		$\displaystyle=\frac{1}{\left(2\pi\right)^{d/2}\|\det A\|}\exp\left[-\frac{1}{2}(% \boldsymbol{x}-\boldsymbol{\mu})^{\prime}\left(A^{-1}\right)^{\prime}A^{-1}(% \boldsymbol{x}-\boldsymbol{\mu})\right]$
		$\displaystyle=\frac{1}{\left(2\pi\right)^{d/2}\|\det A\|}\exp\left[-\frac{1}{2}(% \boldsymbol{x}-\boldsymbol{\mu})^{\prime}\left(AA^{\prime}\right)^{-1}(% \boldsymbol{x}-\boldsymbol{\mu})\right].$