Home page for accesible maths 5 Bivariate Distributions 5.4 Continuous Random Variables 5.6 Independence

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

5.5 Marginal Distributions

Given the joint distribution of $(X,Y)$ we may want to find the (marginal) distribution of $X$ or $Y$ alone. The marginal distribution tells us about the behaviour of one random variable alone, i.e. irrespective of the other. We have been studying such distributions in the earlier chapters on univariate variables.

If we have the cdf $F_{XY}$ the marginal cdfs are obtained as follows:

$F_{X}(x)=\operatorname{\mathsf{P}}\left({X\leq x}\right)=\operatorname{\mathsf% {P}}\left({X\leq x,Y<\infty}\right)=F_{XY}(x,\infty)$ ,
$F_{Y}(y)=\operatorname{\mathsf{P}}\left({Y\leq y}\right)=\operatorname{\mathsf% {P}}\left({X<\infty,Y\leq y}\right)=F_{XY}(\infty,y)$ .

because the marginal event $\{X\leq x\}$ is the same as the joint event $\{X\leq x,Y<\infty\}$ and the event $\{Y\leq y\}$ is the same as the event $\{X<\infty,Y\leq y\}$ , as illustrated on Figure Figure 5.2 (First Link, Second Link).

Figure 5.2: First Link, Second Link, Caption: Left: The event

\{X\leq x\}

. Right: The event

\{Y\leq y\}

Just as, for a discrete RV, the the marginal pmfs are obtained by summing over the other variable, so, for a continuous RV, the marginal pdfs are obtained by integrating over the other variable.

Theorem 5.5.1.

If $X$ and $Y$ are continuous random variables their marginal pdfs are

\displaystyle f_{X}(x)=\int_{-\infty}^{\infty}f_{XY}(x,t)\,\mathrm{d}t,f_{Y}(y% )=\int_{-\infty}^{\infty}f_{XY}(s,y)\,\mathrm{d}s.

Proof.

: For continuous random variables $X$ and $Y$ we have

\displaystyle F_{X}(x)=F_{XY}(x,\infty)=\int_{-\infty}^{x}\int_{-\infty}^{% \infty}f_{XY}(s,t)\,\mathrm{d}t\,\mathrm{d}s=\int_{-\infty}^{x}\left\{\int_{-% \infty}^{\infty}f_{XY}(s,t)\,\mathrm{d}t\right\}\,\mathrm{d}s,

and by differentiating both sides wrt. $x$ we get

\displaystyle f_{X}(x)=\int_{-\infty}^{\infty}f_{XY}(x,t)\,\mathrm{d}t.

Similarly for $Y$ .

Example 5.5.1.

Find the marginal pdf of $X$ for the joint distribution given in Example 5.5.2.

Solution. pdf: The joint pdf is

\displaystyle f_{XY}(x,y)=\left\{\begin{array}[]{ll}(x+y)/8&\quad 0<x<2,\ 0<y<% 2\\ 0&\quad\text{otherwise}\end{array}\right.

Hence

	$\displaystyle f_{X}(x)$	$\displaystyle=\frac{1}{8}{\color[rgb]{0.76,0.01,0}\int_{y=0}^{2}x+t\,\mathrm{d% }t}$
		$\displaystyle=\frac{1}{8}{\color[rgb]{0.76,0.01,0}\left[xt+t^{2}/2\right]_{t=0% }^{2}}$
		$\displaystyle=\frac{1}{8}{\color[rgb]{0.76,0.01,0}(2x+2)=\frac{x+1}{4}}$

for $0<x<2$ .
cdf: $F_{X,Y}(x,y)=(x^{2}y+y^{2}x)/16$ for $0<x<2$ and $0<y<2$ , so

\displaystyle F_{X}(x)=F_{XY}(x,\infty)={\color[rgb]{0.76,0.01,0}F_{X,Y}(x,2)=% \frac{x^{2}+2x}{8}.}

Differentiating gives ${\color[rgb]{0.76,0.01,0}f_{X}(x)=(x+1)/4(0<x<2).}$

Example 5.5.2.

The random variables $(X,Y)$ have joint distribution function

\displaystyle F_{XY}(x,y)=xy\{1+\alpha(1-x)(1-y)\}

for $0<x<1$ , $0<y<1$ , for $-1\leq\alpha\leq 1$ . Find the marginal distributions of $X$ and $Y$ and identify their forms.

Solution. Since $Y<1$ ,

\displaystyle F_{X}(x)=F_{XY}(x,\infty)={\color[rgb]{0.76,0.01,0}F_{XY}(x,1)=x}

for ${\color[rgb]{0.76,0.01,0}0<x<1.}$ By symmetry ${\color[rgb]{0.76,0.01,0}F_{Y}(y)=y}$ for $0<y<1$ . Both marginal distributions are therefore ${\color[rgb]{0.76,0.01,0}{\operatorname{\mathsf{Unif}}}(0,1).}$

Example 5.5.3.

The random variables $(X,Y)$ have joint pdf

\displaystyle f_{XY}(x,y)=\left\{\begin{array}[]{ll}\beta\phi\exp(-\beta x)% \exp(-\phi y)&\quad x>0,\ y>0\\ 0&\quad\text{otherwise}\end{array}\right.

for $\beta>0$ and $\phi>0$ . Find the marginal distributions of $X$ and $Y$ and identify their forms.

Solution.

	$\displaystyle f_{X}(x)$	$\displaystyle=\int_{t=-\infty}^{\infty}f_{XY}(x,t)\,\mathrm{d}t$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\int_{t=0}^{\infty}\beta\phi\exp(-\beta x% )\exp(-\phi t)\,\mathrm{d}t}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\beta\exp(-\beta x)\int_{t=0}^{\infty}% \phi\exp(-\phi t)\,\mathrm{d}t}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\beta\exp(-\beta x)}$

for ${\color[rgb]{0.76,0.01,0}x>0.}$ (since $\phi\exp(-\phi t)$ is a density on $t>0$ ), giving ${\color[rgb]{0.76,0.01,0}X\sim\operatorname{\mathsf{Exp}}(\beta).}$ By symmetry ${\color[rgb]{0.76,0.01,0}Y\sim\operatorname{\mathsf{Exp}}(\phi).}$

5.6 Independence

Recall that, formally, we say that two random variables $X$ and $Y$ are independent if the events $\{X\in A\}$ and $\{Y\in B\}$ are independent for all sets $A$ and $B$ , i.e.

\displaystyle\operatorname{\mathsf{P}}\left({X\in A,Y\in B}\right)=% \operatorname{\mathsf{P}}\left({X\in A}\right)\operatorname{\mathsf{P}}\left({% Y\in B}\right)

for all sets $A$ and $B$ .

We have seen that when $X$ and $Y$ are both discrete, they are independent if and only if their joint pmf can be factorised as a product of the marginal pmfs.

\displaystyle p_{XY}(x,y)=p_{X}(x)p_{Y}(y).

Similarly, when $X$ and $Y$ are both continuous they are independent if and only if their joint pdf can be factorised as a product of the marginal pdfs.

Theorem 5.6.1.

Two continuous random variables $X$ and $Y$ are independent if and only if

\displaystyle f_{XY}(x,y)=f_{X}(x)f_{Y}(y).

Proof.

$\Rightarrow:$ If $X$ and $Y$ are independent then whatever the values of $x$ and $y$ , take $A_{x}=\{s:s\leq x\}$ and $B_{y}=\{t:t\leq y\}$ . Then

\displaystyle F_{XY}(x,y)=\operatorname{\mathsf{P}}\left({X\in A,Y\in B}\right% )=\operatorname{\mathsf{P}}\left({X\in A}\right)\operatorname{\mathsf{P}}\left% ({Y\in B}\right)=F_{X}(x)F_{Y}(y).

This is true for all $x, y$ , and so we may differentiate both sides wrt. $x$ and $y$ to obtain

\displaystyle f_{XY}(x,y)=f_{X}(x)f_{Y}(y).

$\Leftarrow:$ If the joint pdf factorises we get for arbitrary sets $A\subseteq\mathbb{R}$ and $B\subseteq\mathbb{R}$ ,

	$\displaystyle\operatorname{\mathsf{P}}\left({X\in A,Y\in B}\right)$	$\displaystyle=\int_{s\in A}\int_{t\in B}f_{XY}(s,t)\,\mathrm{d}t\,\mathrm{d}s$
		$\displaystyle=\int_{A}\left(\int_{B}f_{X}(s)f_{Y}(t)\,\mathrm{d}t\right)\,% \mathrm{d}s$
		$\displaystyle=\int_{A}f_{X}(s)\left(\int_{B}f_{Y}(t)dt\right)\,\mathrm{d}s$
		$\displaystyle=\operatorname{\mathsf{P}}\left({X\in A}\right)\operatorname{% \mathsf{P}}\left({Y\in B}\right).$

∎

Factorisation To check for independence: if we have the joint pdf (or pmf) it is enough to check that it can be factorised as a function of $x$ times a function of $y$ :

\displaystyle f_{XY}(x,y)=g(x)h(y),

and that the range of $X$ does not depend on $Y$ (see CW question). We do not have to show that the functions $g$ and $h$ are themselves densities. Also if the range of $X$ does not depend on $Y$ then the range of $Y$ does not depend on $X$ , so we only need to check one of the two possibilities.

If the range of $X$ does not depend on $Y$ (and vice versa) we say that $X$ and $Y$ are variationally independent.

Example 5.6.1.

The figure below illustrates the joint density

\displaystyle f_{XY}(x,y)=\frac{1}{|A|}1_{A}(x,y),

where the function $1_{A}(x,y)$ is one when $(x,y)\in A$ and zero otherwise, for four different regions $A$ . In which cases are $X$ and $Y$ independent?

Unnumbered Figure: First link, Second Link

Solution. TL: ind, TR: not ind, BL: not ind., BR: ind.

Given a joint pdf, a standard way to prove independence is to show factorisation and variational independence. To disprove independence, a counterexample to either suffices. This is straightforward for variational independence, but disproving factorisation is less obvious. The following method is recommended. An alternative is to show that a conditional distribution is not the same as a marginal distribution, but that usually involves more work.

Two point method: Note that $f_{XY}$ can be factorised as a function of $x$ times a function of $y$ if and only if for all $x_{1}$ , $x_{2}$ , $y_{1}$ , $y_{2}$ :

\displaystyle f_{XY}(x_{1},y_{1})f_{XY}(x_{2},y_{2})=f_{XY}(x_{1},y_{2})f_{XY}% (x_{2},y_{1})

Since, in the case of independence, both sides equal $f_{X}(x_{1})f_{Y}(y_{1})f_{X}(x_{2})f_{Y}(y_{2})$ .

This is particularly useful for proving that a given joint pdf $f_{XY}$ does not factorise as above. Simply find $(x_{1},y_{1})$ , and $(x_{2},y_{2})$ , such that the two sides above are different.

Example 5.6.2.

Are the following pairs of random variables independent?

(a)

$f_{XY}(x,y)=12xy(1-y)$ for $0<x<1,0<y<1$ ,
(b)

$f_{XY}(x,y)=2\exp(-x-y)$ for $0<x<y<\infty$ ,
(c)

$f_{XY}(x,y)=x+y$ for $0<x<1,0<y<1.$

Solution.

(a)

Independent: variationally independent and ${\color[rgb]{0.76,0.01,0}f_{XY}(x,y)=12x\times y(1-y)}$ , so the joint density factorises.
(b)

Not independent: $f_{XY}=2e^{-x}\times e^{-y}$ factorises BUT the range of $X$ depends on $Y$ .
(c)

Not independent: variationally independent BUT with ${\color[rgb]{0.76,0.01,0}x_{1}=y_{1}=1/3}$ and ${\color[rgb]{0.76,0.01,0}x_{2}=y_{2}=1/2}$ we have

$\displaystyle f_{XY}(x_{1},y_{1})f_{XY}(x_{2},y_{2})={\color[rgb]{0.76,0.01,0}% 2/3\times 1=2/3\neq 25/36=5/6\times 5/6=f_{XY}(x_{1},y_{2})f_{XY}(x_{2},y_{1}).}$

Note: given variational independence, we first try to factorise $x+y$ ; when we cannot, we look for a counter-example.

Fewer sets $A$ and $B$ : as in the proof of Theorem 5.6.1, setting $A_{x}=\{s:s\leq x\}$ and $B_{y}=\{t:t\leq y\}$ shows that if $X$ and $Y$ are independent then for all $x, y$ , $F_{X,Y}(x,y)=F_{X}(x)F_{Y}(y)$ . It turns out (we will not prove this) that for any pair of random variables, whether discrete, continuous or more complicated, ‘ $F_{X,Y}(x,y)=F_{X}(x)F_{Y}(y)$ for all $x, y$ ’ is equivalent to $X$ and $Y$ being independent (i.e. one need only consider a subset of the possible sets $A$ and $B$ ).

Setting $A_{x}=\{s:s>x\}$ and $B_{y}=\{t:t>y\}$ shows that the independence of $X$ and $Y$ implies $S_{X,Y}(x,y)=S_{X}(x)S_{Y}(y)$ for all $x, y$ ; again, it can be shown that independence is equivalent to the factorisation of the survivor functions.

Example 5.6.3.

Let $X$ and $Y$ be independent exponential random variables with parameters $\beta$ and $\phi$ respectively. Find $\operatorname{\mathsf{P}}\left({X>x,Y>y}\right)$ .

Solution. By independence, $\operatorname{\mathsf{P}}\left({X>x,Y>y}\right)=\operatorname{\mathsf{P}}\left% ({X>x}\right)\operatorname{\mathsf{P}}\left({Y>y}\right)$ for $0<x$ , $0<y$ . So

	$\displaystyle\operatorname{\mathsf{P}}\left({X>x,Y>y}\right)$	$\displaystyle=[1-F_{X}(x)][1-F_{Y}(y)]$
		$\displaystyle=\exp(-\beta x)\exp(-\phi y)=\exp(-(\beta x+\phi y)).$

5.7 Conditional Distributions

Suppose we know the joint distribution of $(X,Y)$ but then we find out the value of one of the random variables. What can we say about the other random variable?

We consider the conditional distributions $X\mid Y=y$ , i.e. the distribution of $X$ given that $Y=y$ , and $Y\mid X=x$ , i.e. the distribution of $Y$ given that $X=x$ .

Recall that when $X$ and $Y$ were discrete random variables the conditional pmfs were:

\displaystyle p_{X\mid Y}(x\mid y)=\frac{p_{XY}(x,y)}{p_{Y}(y)},p_{Y\mid X}(y% \mid x)=\frac{p_{XY}(x,y)}{p_{X}(x)}.

Similarly when $X$ and $Y$ are continuous random variables the conditional pdfs are

\displaystyle f_{X\mid Y}(x\mid y)=\frac{f_{XY}(x,y)}{f_{Y}(y)},f_{Y\mid X}(y% \mid x)=\frac{f_{XY}(x,y)}{f_{X}(x)}.

Note that since we can only condition on possible values, we don’t have to worry about zero’s in the denominators: the marginal pmf/pdf has to be positive for the value to occur.

Also note that the conditional pdfs are themselves valid pdfs: they are non-negative and they integrate to $1$ . For instance,

	$\displaystyle\int_{s=-\infty}^{\infty}f_{X\mid Y}(s\mid y)\,\mathrm{d}s$	$\displaystyle=\int_{s=-\infty}^{\infty}\frac{f_{XY}(s,y)}{f_{Y}(y)}\,\mathrm{d}s$
		$\displaystyle=\frac{1}{f_{Y}(y)}\int_{s=-\infty}^{\infty}f_{XY}(s,y)\,\mathrm{% d}s$
		$\displaystyle=\frac{1}{f_{Y}(y)}f_{Y}(y)=1.$

Similarly, conditional pmfs sum to $1$ .

When the variables $(X,Y)$ are independent discrete RVs then for all $x$ , $y$ , recall that

$p_{X\mid Y}(x\mid y)=\frac{p_{X}(x)p_{Y}(y)}{p_{Y}(y)}=p_{X}(x)$ ,
$p_{Y\mid X}(y\mid x)=\frac{p_{X}(x)p_{Y}(y)}{p_{X}(x)}=p_{Y}(y)$ .

Similarly if $(X,Y)$ are independent continuous RVs then for all $x$ , $y$ ,

$f_{X\mid Y}(x\mid y)=\frac{f_{X}(x)f_{Y}(y)}{f_{Y}(y)}=f_{X}(x)$ ,
$f_{Y\mid X}(y\mid x)=\frac{f_{X}(x)f_{Y}(y)}{f_{X}(x)}=f_{Y}(y)$ .

These results conform with intuition as when $X$ and $Y$ are independent knowing the value of $X$ should tell us nothing about $Y$ and vice versa.

The converse is also true: If the conditional distribution of $X$ given $Y=y$ is independent of $y$ or, equivalently, the conditional distribution of $Y$ given $X=x$ is independent of $x$ , then $X$ and $Y$ are independent.

Example 5.7.1.

A piece of string of unit length is tied at one end to a hook. The string is cut at a (uniform) random distance $X$ from the hook. The piece remaining is then cut again at a (uniform) random distance $Y$ from the hook. Given that the remaining length tied to the hook has length $y$ , find the pdf of the position of the first cut.

Unnumbered Figure: Link

Solution. Model with $X\sim{\color[rgb]{0.76,0.01,0}{\operatorname{\mathsf{Unif}}}(0,1)}$ and $Y|X\sim{\color[rgb]{0.76,0.01,0}{\operatorname{\mathsf{Unif}}}(0,x)}.$ We know $f_{X}(x)={\color[rgb]{0.76,0.01,0}1}$ for $0\leq x\leq 1$ , and $f_{Y\mid X}(y|x)={\color[rgb]{0.76,0.01,0}1/x}$ for $0<y<x<1$ .

Now $f_{X|Y}(x|y)=f_{XY}(x,y)/f_{Y}(y)$ .

We know $f_{XY}(x,y)={\color[rgb]{0.76,0.01,0}f_{X}(x)f_{Y|X}(y|x)=1/x(0<y<x<1)}$ and ${\color[rgb]{0.76,0.01,0}0}$ otherwise.

So we need $f_{Y}(y)$ .

	$\displaystyle f_{Y}(y)$	$\displaystyle=\int_{s=y}^{1}{f_{XY}(s,y)}\,\mathrm{d}s$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\int_{s=y}^{1}\frac{1}{s}\,\mathrm{d}s}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\left[\log{s}\right]_{s=y}^{1}}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}-\log(y)}$

for ${\color[rgb]{0.76,0.01,0}0<y<1.}$ Hence $f_{X|Y}(x|y)={\color[rgb]{0.76,0.01,0}-1/(x\log(y)),y<x<1.}$

Example 5.7.2.

Continuous random variables $X$ and $Y$ have joint pdf

\displaystyle f_{XY}(x,y)=\left\{\begin{array}[]{ll}\exp(-x/y)\exp(-y)/y&\quad 0% <x<\infty,\ 0<y<\infty\\ 0&\quad\text{otherwise}\end{array}\right.

Find

1.

the conditional pdf of $X$ given $Y=y$ ,
2.

$\operatorname{\mathsf{P}}\left({X>1\mid Y=1}\right)$ .

Solution.

1.

Since $f_{X|Y}(x|y)=f_{XY}(x,y)/f_{Y}(y)$ we need the marginal pdf $f_{Y}(y)$ .

$\displaystyle f_{Y}(y)$ $\displaystyle=\int_{s=0}^{\infty}\exp(-s/y)\exp(-y)/y\,\mathrm{d}s$

$\displaystyle={\color[rgb]{0.76,0.01,0}\left[-\exp(-s/y-y)\right]_{s=0}^{% \infty}}$

$\displaystyle={\color[rgb]{0.76,0.01,0}\exp(-y).}$

for ${\color[rgb]{0.76,0.01,0}y>0.}$ Hence for ${\color[rgb]{0.76,0.01,0}x>0}$

$\displaystyle f_{X|Y}(x|y)={\color[rgb]{0.76,0.01,0}\frac{\exp(-x/y)\exp(-y)/y% }{\exp(-y)}=\exp(-x/y)/y.}$
2.

When $Y=1$ we have $f_{X|Y}(x|Y=1)={\color[rgb]{0.76,0.01,0}\exp(-x),}$ so

$\displaystyle\operatorname{\mathsf{P}}\left({X>1|Y=1}\right)={\color[rgb]{% 0.76,0.01,0}\int_{1}^{\infty}\exp(-s)\,\mathrm{d}s=\left[-\exp(-s)\right]_{1}^% {\infty}=e^{-1}.}$

5.8 Key definitions and Relationships

Let $(X,Y)$ be a bivariate rv.

1.

The joint cdf is $F_{X,Y}(x,y)=\operatorname{\mathsf{P}}\left({X\leq x,Y\leq y}\right)$ . $F_{X}(x)=F_{X,Y}(x,\infty)$ .
2.

For a discrete rv, the joint pmf is $p_{X,Y}(x,y)=\operatorname{\mathsf{P}}\left({X=x,Y=y}\right)$ .
3.

For a continuous rv, the joint pdf is $f_{X,Y}(x,y)=\frac{\partial^{2}}{\partial x\partial y}F_{X,Y}(x,y)$ .
4.

For discrete rvs $X$ and $Y$ , the marginal pmf of $X$ , is $p_{X}(x)=\sum_{j=-\infty}^{\infty}p_{X,Y}(x,j)$ , and the conditional pmf of $X$ given $Y=y$ is $p_{X|Y}(x|y)=p_{X,Y}(x,y)/p_{Y}(y)$ .
5.

For continuous rvs $X$ and $Y$ , the marginal pdf of $X$ is $f_{X}(x)=\int_{t=-\infty}^{\infty}f_{X,Y}(x,t)\,\mathrm{d}t$ , and the conditional pdf of $X$ given $Y=y$ is $f_{X|Y}(x|y)=f_{X,Y}(x,y)/f_{Y}(y)$ .
6.

$X$ and $Y$ are independent if and only if the events $\{X\in A\}$ and $\{Y\in B\}$ are independent for all sets $A$ and $B$ : $\operatorname{\mathsf{P}}\left({X\in A,Y\in B}\right)=\operatorname{\mathsf{P}% }\left({X\in A}\right)\operatorname{\mathsf{P}}\left({Y\in B}\right)$ for all $A$ , $B$ .
7.

An equivalent, but easier to check, condition for independence (of discrete or continuous rvs) is: $F_{X,Y}(x,y)=F_{X}(x)F_{Y}(y)$ . For discrete rvs, independence is also equivalent to $p_{X,Y}(x,y)=p_{X}(x)p_{Y}(y)$ , whereas for continuous rvs it is equivalent to $f_{X,Y}(x,y)=f_{X}(x)f_{Y}(y)$ . When just checking factorisation within the range where the rvs are non-zero, variational independence must also be verified.
8.

Lack of independence can be shown using the two-point method; showing that $f_{X,Y}(x_{1},y_{1})f_{X,Y}(x_{2},y_{2})\neq f_{X,Y}(x_{1},y_{2})f_{X,Y}(x_{2}% ,y_{1})$ for some $x_{1},x_{2},y_{1},y_{2}$ . Alternatively, show that $f_{X|Y}(x|y)\neq f_{X}(x)$ for some $y$ .

Chapter 6 Expectation (II)

We have already encountered the expectation $E(X)$ and variance $V(X)$ of a univariate random variable, $X$ . In this chapter we examine the corresponding measures for multivariate random variables. We also investigate a particularly useful expectation: the moment generating function.

6.1 Bivariate Expectations

We know how to obtain expectations for univariate random variables. The definition extends easily to bivariate random variables. The expectation of any function $g(X,Y)$ is given by:

Discrete random variables

\displaystyle\operatorname{\mathsf{E}}\left[{g(X,Y)}\right]=\sum_{s=-\infty}^{% \infty}\sum_{t=-\infty}^{\infty}g(s,t)p_{XY}(s,t),

Continuous random variables

\displaystyle\operatorname{\mathsf{E}}\left[{g(X,Y)}\right]=\int_{s=-\infty}^{% \infty}\int_{t=-\infty}^{\infty}g(s,t)f_{XY}(s,t)\,\mathrm{d}t\,\mathrm{d}s.

In the rest of this section results are given for the continuous random variable case only, however these extend immediately to discrete random variables.

Moments of either variable alone can be obtained from the joint distribution or from the relevant marginal.

\displaystyle\operatorname{\mathsf{E}}\left[{X}\right]=\int_{s=-\infty}^{% \infty}\int_{t-\infty}^{\infty}sf_{XY}(s,t)\,\mathrm{d}t\,\mathrm{d}s=\int_{-% \infty}^{\infty}s\left\{\int_{-\infty}^{\infty}f_{XY}(s,t)\,\mathrm{d}t\right% \}\,\mathrm{d}s=\int_{-\infty}^{\infty}sf_{X}(s)\,\mathrm{d}s,

and, more generally, for a function $g$ ,

\displaystyle\operatorname{\mathsf{E}}\left[{g(X)}\right]=\int_{s=-\infty}^{% \infty}\int_{t=-\infty}^{\infty}g(s)f_{XY}(s,t)\,\mathrm{d}t\,\mathrm{d}s=\int% _{-\infty}^{\infty}g(s)f_{X}(s)\,\mathrm{d}s.

Similarly for $Y$ and any function $h$ (including $h(Y)=Y$ ),

\displaystyle\operatorname{\mathsf{E}}\left[{h(Y)}\right]=\int_{t=-\infty}^{% \infty}\int_{s=-\infty}^{\infty}h(t)f_{XY}(s,t)\,\mathrm{d}s\,\mathrm{d}t=\int% _{-\infty}^{\infty}h(t)f_{Y}(t)\,\mathrm{d}t.

Using linearity of integrals we also have for any functions $g$ and $h$

	$\displaystyle\operatorname{\mathsf{E}}\left[{g(X)+h(Y)}\right]$	$\displaystyle=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}[g(s)+h(t)]f_{XY}(% s,t)\,\mathrm{d}t\,\mathrm{d}s$
		$\displaystyle=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}g(s)f_{XY}(s,t)\,% \mathrm{d}t\,\mathrm{d}s+\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}h(t)f_{% XY}(s,t)\,\mathrm{d}t\,\mathrm{d}s$
		$\displaystyle=\operatorname{\mathsf{E}}\left[{g(X)}\right]+\operatorname{% \mathsf{E}}\left[{h(Y)}\right].$

In particular

\displaystyle\operatorname{\mathsf{E}}\left[{X+Y}\right]=\operatorname{\mathsf% {E}}\left[{X}\right]+\operatorname{\mathsf{E}}\left[{Y}\right],

regardless of the joint distribution of $(X,Y)$ .

If $X$ and $Y$ are independent we also have for any functions $g$ and $h$

	$\displaystyle\operatorname{\mathsf{E}}\left[{g(X)h(Y)}\right]$	$\displaystyle=\int_{s=-\infty}^{\infty}\int_{t=-\infty}^{\infty}g(s)h(t)f_{XY}% (s,t)\,\mathrm{d}t\,\mathrm{d}s$
		$\displaystyle=\int_{s=-\infty}^{\infty}\int_{t=-\infty}^{\infty}g(s)h(t)f_{X}(% s)f_{Y}(t)\,\mathrm{d}t\,\mathrm{d}s$
		$\displaystyle=\int_{s=-\infty}^{\infty}g(s)f_{X}(s)\left\{\int_{t=-\infty}^{% \infty}h(t)f_{Y}(t)\,\mathrm{d}t\right\}\,\mathrm{d}s$
		$\displaystyle=\left\{\int_{-\infty}^{\infty}g(s)f_{X}(s)\,\mathrm{d}s\right\}% \left\{\int_{-\infty}^{\infty}h(t)f_{Y}(t)\,\mathrm{d}t\right\}$
		$\displaystyle=\operatorname{\mathsf{E}}\left[{g(X)}\right]\operatorname{% \mathsf{E}}\left[{h(Y)}\right].$

In particular, if $X$ and $Y$ are independent, then

\displaystyle\operatorname{\mathsf{E}}\left[{XY}\right]=\operatorname{\mathsf{% E}}\left[{X}\right]\operatorname{\mathsf{E}}\left[{Y}\right].

Firstly we note that for dependent random variables $\operatorname{\mathsf{E}}\left[{XY}\right]\neq\operatorname{\mathsf{E}}\left[{% X}\right]\operatorname{\mathsf{E}}\left[{Y}\right]$ , in general. For example, setting $Y=X$ gives

\displaystyle\operatorname{\mathsf{E}}\left[{XY}\right]={\color[rgb]{% 0.76,0.01,0}\operatorname{\mathsf{E}}\left[{X^{2}}\right]\geq\operatorname{% \mathsf{E}}\left[{X}\right]^{2}=\operatorname{\mathsf{E}}\left[{X}\right]% \operatorname{\mathsf{E}}\left[{Y}\right]},

the difference between the two being ${\color[rgb]{0.76,0.01,0}{\operatorname{\mathsf{Var}}}\left[{X}\right].}$

More subtly, even when $\operatorname{\mathsf{E}}\left[{XY}\right]=\operatorname{\mathsf{E}}\left[{X}% \right]\operatorname{\mathsf{E}}\left[{Y}\right]$ , $X$ and $Y$ need not be independent.

Example 6.1.1.

Let $X\sim N(0,1)$ and $Y=X^{2}-1$ . Find $\operatorname{\mathsf{E}}\left[{XY}\right]$ and $\operatorname{\mathsf{E}}\left[{X}\right]\operatorname{\mathsf{E}}\left[{Y}\right]$ .

Solution. $\operatorname{\mathsf{E}}\left[{X}\right]=0$ , so $\operatorname{\mathsf{E}}\left[{X}\right]\operatorname{\mathsf{E}}\left[{Y}% \right]={\color[rgb]{0.76,0.01,0}0}$ . Also

\displaystyle\operatorname{\mathsf{E}}\left[{XY}\right]={\color[rgb]{% 0.76,0.01,0}\operatorname{\mathsf{E}}\left[{X^{3}-X}\right]=\operatorname{% \mathsf{E}}\left[{X^{3}}\right]-\operatorname{\mathsf{E}}\left[{X}\right]=0-0=% 0,}

since $\operatorname{\mathsf{E}}\left[{X^{r}}\right]=0$ for $r$ an odd integer. So $\operatorname{\mathsf{E}}\left[{XY}\right]=\operatorname{\mathsf{E}}\left[{X}% \right]\operatorname{\mathsf{E}}\left[{Y}\right]=0$ .

The joint distribution of $(X,Y)$ is illustrated on Figure 6.1. Clearly the variables $X$ and $Y$ are strongly related, as given $X$ we know $Y$ exactly.

Figure 6.1: Link, Caption: A 1000 realisations of

(X,Y)

, where

X\sim N(0,1)

and

Y=X^{2}-1

X

and

Y

are uncorrelated (

\rho=0

) but not independent.

Example 6.1.2.

Find the expected value of $X-Y$ if $\operatorname{\mathsf{E}}\left[{X}\right]=\operatorname{\mathsf{E}}\left[{Y}\right]$ . Does this result depend on other features of the joint distribution of $(X,Y)$ ?

Solution. $\operatorname{\mathsf{E}}\left[{X-Y}\right]={\color[rgb]{0.76,0.01,0}% \operatorname{\mathsf{E}}\left[{X}\right]+\operatorname{\mathsf{E}}\left[{-Y}% \right]=\operatorname{\mathsf{E}}\left[{X}\right]-\operatorname{\mathsf{E}}% \left[{Y}\right]=0}$ . No other assumptions are needed.

Example 6.1.3.

The random variables $(X,Y)$ have joint pdf

\displaystyle f_{XY}(x,y)=\left\{\begin{array}[]{ll}1/2&\quad 0<x<y,\ 0<y<2\\ 0&\quad\text{otherwise}\end{array}\right.

Find $\operatorname{\mathsf{E}}\left[{X}\right]$ , $\operatorname{\mathsf{E}}\left[{Y}\right]$ and $\operatorname{\mathsf{E}}\left[{XY}\right]$ . Does $\operatorname{\mathsf{E}}\left[{XY}\right]=\operatorname{\mathsf{E}}\left[{X}% \right]\operatorname{\mathsf{E}}\left[{Y}\right]$ ?

Solution.

Unnumbered Figure: Link

	$\displaystyle\operatorname{\mathsf{E}}\left[{X}\right]$	$\displaystyle=\int_{s=0}^{2}\int_{t=s}^{2}s\frac{1}{2}\,\mathrm{d}t\,\mathrm{d}s$
		$\displaystyle=\frac{1}{2}{\color[rgb]{0.76,0.01,0}\int_{s=0}^{2}2s-s^{2}\,% \mathrm{d}s}$
		$\displaystyle=\frac{1}{2}{\color[rgb]{0.76,0.01,0}\left[s^{2}-\frac{s^{3}}{3}% \right]_{0}^{2}}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}2/3}$

	$\displaystyle\operatorname{\mathsf{E}}\left[{Y}\right]$	$\displaystyle=\int_{s=0}^{2}\int_{t=s}^{2}t\frac{1}{2}\,\mathrm{d}t\,\mathrm{d}s$
		$\displaystyle=\frac{1}{2}{\int_{s=0}^{2}\frac{2^{2}-s^{2}}{2}\,\mathrm{d}s}$
		$\displaystyle=\frac{1}{2}{\left[2s-\frac{s^{3}}{6}\right]_{0}^{2}}$
		$\displaystyle=4/3.$

	$\displaystyle\operatorname{\mathsf{E}}\left[{XY}\right]$	$\displaystyle=\int_{s=0}^{2}\int_{t=s}^{2}st\frac{1}{2}\,\mathrm{d}t\,\mathrm{% d}s$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\int_{s=0}^{2}\left[\frac{st^{2}}{4}% \right]^{2}_{t=s}\,\mathrm{d}s}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\int_{s=0}^{2}s-\frac{s^{3}}{4}\,% \mathrm{d}s}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\left[\frac{s^{2}}{2}-\frac{s^{4}}{16}% \right]_{s=0}^{2}}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}1\neq\operatorname{\mathsf{E}}\left[{X}% \right]\operatorname{\mathsf{E}}\left[{Y}\right]=8/9.}$

6.2 Conditional Expectations

Expectations for conditional random variables are defined in the obvious way. Conditional expectations are given by

$\operatorname{\mathsf{E}}\left[{X\mid Y=y}\right]=\int_{-\infty}^{\infty}sf_{X% \mid Y}(s\mid y)\,\mathrm{d}s$ ,
$\operatorname{\mathsf{E}}\left[{Y\mid X=x}\right]=\int_{-\infty}^{\infty}tf_{Y% \mid X}(t\mid x)\,\mathrm{d}t$ .

$\operatorname{\mathsf{E}}\left[{Y\mid X=x}\right]$ is a function $g(x)$ , say, of $x$ (a real number). If we have not yet seen $x$ then this becomes a function $g(X)$ of the random variable $X$ . i.e. $\operatorname{\mathsf{E}}\left[{Y\mid X}\right]$ is a random variable because it is a function of the random variable $X$ .

Sometimes conditioning provides an easy way to obtain the expectations of the marginal variables. Consider the random variable $\operatorname{\mathsf{E}}\left[{h(Y)|X}\right]$ , which is a function of $X$ . Just as $\operatorname{\mathsf{E}}\left[{g(X)}\right]=\int g(s)f_{X}(s)\,\mathrm{d}s$ , so the expectation of $\operatorname{\mathsf{E}}\left[{h(Y)|X}\right]$ is

	$\displaystyle\operatorname{\mathsf{E}}\left[{\operatorname{\mathsf{E}}\left[{h% (Y)\|X}\right]}\right]$	$\displaystyle=\int_{-\infty}^{\infty}\operatorname{\mathsf{E}}\left[{h(Y)\mid X% =s}\right]f_{X}(s)\,\mathrm{d}s$
		$\displaystyle=\int_{s=-\infty}^{\infty}\int_{t=-\infty}^{\infty}h(t)f_{Y\mid X% }(t\mid s)\,\mathrm{d}tf_{X}(s)\,\mathrm{d}s$
		$\displaystyle=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}h(t)f_{XY}(s,t)\,% \mathrm{d}s\,\mathrm{d}t$
		$\displaystyle=\operatorname{\mathsf{E}}\left[{h(Y)}\right].$

Now consider $\operatorname{\mathsf{E}}\left[{g(X)h(Y)|X}\right]$ , which is a random variable, since it is a function of the random variable $X$ .

	$\displaystyle\operatorname{\mathsf{E}}\left[{g(X)h(Y)\|X}\right]$	$\displaystyle=\int_{t=-\infty}^{\infty}g(X)h(t)f_{Y\|X}(t\|X)\,\mathrm{d}t$
		$\displaystyle={\color[rgb]{0.76,0.01,0}g(X)\int_{t=-\infty}^{\infty}h(t)f_{Y\|X% }(t\|X)\,\mathrm{d}t}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}g(X)\operatorname{\mathsf{E}}\left[{h(Y% )\|X}\right].}$

Intuitively, by conditioning on the unknown $X$ it becomes an unknown constant as far as the expectation is concerned and so it can be taken outside the expectation.

Example 6.2.1.

The rvs $X$ and $Y$ follow a distribution specified by $X\sim\operatorname{\mathsf{N}}(0,1)$ and $Y\mid X=x\sim\operatorname{\mathsf{N}}(\alpha x,1)$ .

(a)

Write down $\operatorname{\mathsf{E}}\left[{Y|X=x}\right]$ and ${\operatorname{\mathsf{Var}}}\left[{Y|X=x}\right]$ .
(b)

Find $\operatorname{\mathsf{E}}\left[{X}\right]$ and $\operatorname{\mathsf{E}}\left[{Y}\right]$ .
(c)

Find $\operatorname{\mathsf{E}}\left[{XY}\right]$ .

Solution.

(a)

$\operatorname{\mathsf{E}}\left[{Y|X=x}\right]={\color[rgb]{0.76,0.01,0}\alpha x}$ and ${\operatorname{\mathsf{Var}}}\left[{Y|X=x}\right]={\color[rgb]{0.76,0.01,0}1}$ .
(b)

$\operatorname{\mathsf{E}}\left[{X}\right]={\color[rgb]{0.76,0.01,0}0}$ and

$\displaystyle\operatorname{\mathsf{E}}\left[{Y}\right]$ $\displaystyle=\operatorname{\mathsf{E}}\left[{\operatorname{\mathsf{E}}\left[{% Y|X}\right]}\right]$

$\displaystyle={\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{E}}\left[{\alpha X% }\right]=\alpha\operatorname{\mathsf{E}}\left[{X}\right]=0}$
(c)

$\displaystyle\operatorname{\mathsf{E}}\left[{XY}\right]$ $\displaystyle=\operatorname{\mathsf{E}}\left[{\operatorname{\mathsf{E}}\left[{% XY|X}\right]}\right]$

$\displaystyle={\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{E}}\left[{X% \operatorname{\mathsf{E}}\left[{Y|X}\right]}\right]}$

$\displaystyle={\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{E}}\left[{\alpha X% ^{2}}\right]}$

$\displaystyle={\color[rgb]{0.76,0.01,0}\alpha.}$

Note that $\operatorname{\mathsf{E}}\left[{XY}\right]-\operatorname{\mathsf{E}}\left[{X}% \right]\operatorname{\mathsf{E}}\left[{Y}\right]={\color[rgb]{0.76,0.01,0}\alpha}$ .

The conditional variances are given by

	$\displaystyle{\operatorname{\mathsf{Var}}}\left[{X\mid Y=y}\right]$	$\displaystyle=\int_{-\infty}^{\infty}(s-\operatorname{\mathsf{E}}\left[{X\mid Y% =y}\right])^{2}f_{X\mid Y}(s\mid y)\,\mathrm{d}s$
		$\displaystyle=\operatorname{\mathsf{E}}\left[{X^{2}\mid Y=y}\right]-% \operatorname{\mathsf{E}}\left[{X\mid Y=y}\right]^{2},$

	$\displaystyle{\operatorname{\mathsf{Var}}}\left[{Y\mid X=x}\right]$	$\displaystyle=\int_{-\infty}^{\infty}(t-\operatorname{\mathsf{E}}\left[{Y\mid X% =x}\right])^{2}f_{Y\mid X}(t\mid x)\,\mathrm{d}t$
		$\displaystyle=\operatorname{\mathsf{E}}\left[{Y^{2}\mid X=x}\right]-% \operatorname{\mathsf{E}}\left[{Y\mid X=x}\right]^{2}.$

If $X$ and $Y$ are independent the conditional distributions are the same as the marginal distributions ( $f_{X|Y}(x|y)=f_{X}(x)$ and $f_{Y|X}(y|x)=f_{Y}(y)$ ), so that in particular

$\operatorname{\mathsf{E}}\left[{X\mid Y=y}\right]=\operatorname{\mathsf{E}}% \left[{X}\right]$ ,
${\operatorname{\mathsf{Var}}}\left[{X\mid Y=y}\right]={\operatorname{\mathsf{% Var}}}\left[{X}\right]$ ,
$\operatorname{\mathsf{E}}\left[{Y\mid X=x}\right]=\operatorname{\mathsf{E}}% \left[{Y}\right]$ ,
${\operatorname{\mathsf{Var}}}\left[{Y\mid X=x}\right]={\operatorname{\mathsf{% Var}}}\left[{Y}\right]$ .

6.3 Decomposition of the marginal variance

We have seen that the marginal expectations can be obtained from the conditional expectations. We can also obtain the marginal variances from the conditional expectations and variances by the following formula:

	$\displaystyle\operatorname{\mathsf{E}}\left[{{\operatorname{\mathsf{Var}}}% \left[{Y\mid X}\right]}\right]+{\operatorname{\mathsf{Var}}}\left[{% \operatorname{\mathsf{E}}\left[{Y\mid X}\right]}\right]$
	$\displaystyle=\operatorname{\mathsf{E}}\left[{\operatorname{\mathsf{E}}\left[{% Y^{2}\mid X}\right]-\operatorname{\mathsf{E}}\left[{Y\mid X}\right]^{2}}\right% ]+\operatorname{\mathsf{E}}\left[{\operatorname{\mathsf{E}}\left[{Y\mid X}% \right]^{2}}\right]-\operatorname{\mathsf{E}}\left[{\operatorname{\mathsf{E}}% \left[{Y\mid X}\right]}\right]^{2}$
	$\displaystyle=\operatorname{\mathsf{E}}\left[{Y^{2}}\right]-\operatorname{% \mathsf{E}}\left[{Y}\right]^{2}$
	$\displaystyle={\operatorname{\mathsf{Var}}}\left[{Y}\right].$

These formulae are particularly useful when a random variable $Y$ is given as a mixture of distributions. This is most easily illustrated by an example.

Example 6.3.1.

Let $X$ be a Poisson $(\lambda)$ random variable, and given $X$ takes the value $x$ let $Y$ be Binomial $(x,p)$ -distributed, i.e. $Y\mid X=x\sim$ Binomial $(x,p)$ . Find the expectation and variance of $Y$ .

Solution. From properties of the Binomial distribution we have

$\operatorname{\mathsf{E}}\left[{Y\mid X=x}\right]=xp$ ,
${\operatorname{\mathsf{Var}}}\left[{Y\mid X=x}\right]=xp(1-p)$ .

Hence, using properties of the Poisson distribution we obtain

\displaystyle\operatorname{\mathsf{E}}\left[{Y}\right]=\operatorname{\mathsf{E% }}\left[{\operatorname{\mathsf{E}}\left[{Y\mid X}\right]}\right]=\operatorname% {\mathsf{E}}\left[{Xp}\right]=\lambda p.

	$\displaystyle{\operatorname{\mathsf{Var}}}\left[{Y}\right]$	$\displaystyle=\operatorname{\mathsf{E}}\left[{{\operatorname{\mathsf{Var}}}% \left[{Y\mid X}\right]}\right]+{\operatorname{\mathsf{Var}}}\left[{% \operatorname{\mathsf{E}}\left[{Y\mid X}\right]}\right]$
		$\displaystyle=\operatorname{\mathsf{E}}\left[{Xp(1-p)}\right]+{\operatorname{% \mathsf{Var}}}\left[{Xp}\right]$
		$\displaystyle=\lambda p(1-p)+\lambda p^{2}$
		$\displaystyle=\lambda p.$

In fact, it can be shown that $Y\sim\operatorname{Poisson}(\lambda p)$ .

6.4 Moment generating functions

The moment generating function or mgf of a random variable $X$ is defined through

\displaystyle M_{X}(t)=\operatorname{\mathsf{E}}\left[{e^{tX}}\right]=\left\{% \begin{array}[]{ll}\sum_{i}e^{ti}p_{X}(i)&\quad\text{if }X\text{ is discrete % rv with pmf }p_{X}(x)\\ \int_{s}e^{ts}f_{X}(s)\,\mathrm{d}s&\quad\text{if }X\text{ is continuous rv % with pdf }f_{X}(s)\end{array}\right.

for all real values of $t$ for which the expectation exists.

Moment generating functions can be manipulated in many ways to reveal properties of the underlying probability distributions. They often help in mathematical proofs of probability theorems, and will be used for this purpose in Chapter 9.

Example 6.4.1.

Find the mgf of the random variable following the exponential distribution with parameter $\beta$ ; sketch the mgf when $\beta=4$ .

Solution. $X\sim\operatorname{\mathsf{Exp}}(\lambda)\Rightarrow f_{X}(x)=\beta e^{-\beta x}$ , for $x>0$ . Hence,

	$\displaystyle M_{X}(t)$	$\displaystyle={\color[rgb]{0.76,0.01,0}\int_{0}^{\infty}e^{tx}\beta e^{-\beta x% }\,\mathrm{d}x=\lambda\int_{0}^{\infty}e^{-x(\beta-t)}\,\mathrm{d}x}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\frac{\beta}{\beta-t}}$

for ${\color[rgb]{0.76,0.01,0}\beta>t}$ . Note that $M_{X}(t)$ is only defined for $\beta>t$ , since only in that case does the integral exist. Hence, for $\beta=4$ the mgf looks like:

Unnumbered Figure: Link

Quiz: Now consider a general rv: can the mgf be negative? No; it is the expectation of a non-negative quantity.

Theorem 6.4.1.

If mgf is defined in some neighbourhood of the origin, $|t|<t_{0}$ , the following properties are satisfied:

1.

The mgf determines uniquely the distribution of the rv $X$ . That is, if two rvs have the same mgf then they have the same cdf.
2.

If $Z=a+bX$ , for $a$ real and $b$ non-zero real number, $M_{Z}(t)=e^{at}M_{X}(bt)$ .
3.

Moments about the origin can be obtained by differentiating the mgf with respect to $t$ and then evaluating the mgf at zero, i.e.

$\displaystyle M_{X}(0)=\operatorname{\mathsf{E}}\left[{X^{0}}\right]=1;M^{% \prime}(0)=\operatorname{\mathsf{E}}\left[{X}\right];M^{\prime\prime}(0)=% \operatorname{\mathsf{E}}\left[{X^{2}}\right]\ldots$

Hence the name!
4.

Let $X, Y$ be independent rvs with mgf $M_{X}(t),M_{Y}(t)$ respectively. Then,

$\displaystyle M_{X+Y}(t)=M_{X}(t)M_{Y}(t).$

Proof.

1.

Proof uses ideas from complex analysis (see Math215).
2.

If $Z=a+bX$ , then

$\displaystyle M_{Z}(t)=M_{a+bX}(t)=\operatorname{\mathsf{E}}\left[{e^{(a+bX)t}% }\right]=e^{at}\operatorname{\mathsf{E}}\left[{e^{bXt}}\right]=e^{at}M_{X}(bt).$
3.
Since $M=\operatorname{\mathsf{E}}\left[{e^{tX}}\right]$ then $M^{\prime}(t)=\operatorname{\mathsf{E}}\left[{Xe^{tX}}\right]$ , $M^{\prime\prime}(t)=\operatorname{\mathsf{E}}\left[{X^{2}e^{tX}}\right]$ etc.; but $e^{0X}=1$ so
1. $M_{X}(0)=1$ ,
2. $M_{X}^{{}^{\prime}}(0)=E(X)$ ,
3. $M_{X}^{{}^{\prime\prime}}(0)=E(X^{2})$ ,
and so on.
4.

$\displaystyle M_{X+Y}(t)$ $\displaystyle=\operatorname{\mathsf{E}}\left[{e^{(X+Y)t}}\right]$

$\displaystyle=\operatorname{\mathsf{E}}\left[{e^{Xt}e^{Yt}}\right]$

$\displaystyle={\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{E}}\left[{e^{Xt}}% \right]\operatorname{\mathsf{E}}\left[{e^{Yt}}\right]}$

by independence, so $M_{X+Y}(t)=M_{X}(t)M_{Y}(t)$ . ∎

From Part 4, by induction, if $X_{1},X_{2},\dots,X_{n}$ are independent random variables:

\displaystyle M_{X_{1}+X_{2}+\dots+X_{n}}(t)=M_{X_{1}}(t)M_{X_{2}}(t)\dots M_{% X_{n}}(t).

Example 6.4.2.

Using its mgf, find the expectation and the variance of the random variable following the exponential distribution with parameter $\lambda$ .

Solution. Consider the first two derivatives of the mgf:

$M^{{}^{\prime}}(t)={\color[rgb]{0.76,0.01,0}\frac{\beta}{(\beta-t)^{2}}}$ , $M^{{}^{\prime\prime}}(t)={\color[rgb]{0.76,0.01,0}\frac{2\beta}{(\beta-t)^{3}}}$ .

Hence, $\operatorname{\mathsf{E}}\left[{X}\right]=M^{{}^{\prime}}(0)={\color[rgb]{% 0.76,0.01,0}\frac{1}{\beta}}$ , $\operatorname{\mathsf{E}}\left[{X^{2}}\right]=M^{{}^{\prime\prime}}(0)={\color% [rgb]{0.76,0.01,0}\frac{2}{\beta^{2}}}$ and

${\operatorname{\mathsf{Var}}}\left[{X}\right]=\operatorname{\mathsf{E}}\left[{% X^{2}}\right]-\operatorname{\mathsf{E}}\left[{X}\right]^{2}={\color[rgb]{% 0.76,0.01,0}\frac{1}{\beta^{2}}.}$

The mgf of a Normal random variable We first consider $Z\sim N(0,1)$ . Then

	$\displaystyle M_{Z}(t)$	$\displaystyle=\operatorname{\mathsf{E}}\left[{e^{Zt}}\right]=\int_{-\infty}^{% \infty}e^{zt}\frac{1}{\sqrt{2\pi}}e^{-\frac{z^{2}}{2}}\,\mathrm{d}z$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{% \infty}e^{-\frac{z^{2}-2zt}{2}}\,\mathrm{d}z}$
		$\displaystyle=e^{t^{2}/2}\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{\infty}e^{-\frac% {(z-t)^{2}}{2}}\,\mathrm{d}z$

by completing the squares. Hence $M_{Z}(t)={\color[rgb]{0.76,0.01,0}e^{t^{2}/2}}$ by unit integrability of the N(t,1) density.

So if $V=\mu+\sigma Z$ then by Property 2,

\displaystyle M_{v}(t)=e^{t\mu}M_{Z}(t\sigma)=e^{\mu t+\frac{1}{2}\sigma^{2}t^% {2}}.

For instance, if $Z\sim N(0,1)$ then

$M_{Z}(t)=e^{t^{2}/2}$ ,
$M^{\prime}_{Z}(t)=te^{t^{2}/2}$ ,
$M^{\prime\prime}_{Z}(t)=t^{2}e^{t^{2}/2}+e^{t^{2}/2}$ ,
$M^{\prime\prime\prime}_{Z}(t)=t^{3}e^{t^{2}/2}+3te^{t^{2}/2}$ ,
$M^{iv}_{Z}(t)=t^{4}e^{t^{2}/2}+6t^{2}e^{t^{2}/2}+3e^{t^{2}/2}$ .

In particular $M^{\prime\prime}_{Z}(0)=1,M^{iv}_{Z}(0)=3$ so $\operatorname{\mathsf{E}}\left[{Z^{2}}\right]=1$ and $\operatorname{\mathsf{E}}\left[{Z^{4}}\right]=3$ as mentioned in Chapter 3.

Unfortunately the mgf is not defined for some rvs.

Example 6.4.3.

Let $X\sim\operatorname{\mathsf{Cauchy}}$ , then

\displaystyle M_{x}(t)=\int_{-\infty}^{\infty}\frac{e^{tx}}{\pi(1+x^{2})}\,% \mathrm{d}x,

which is not defined as, if $t>0$ the integrand $\rightarrow\infty$ as $x\rightarrow\infty$ and if $t<0$ the integrand $\rightarrow\infty$ as $x\rightarrow-\infty$ .

Theorem 6.4.2.

The sum of two independent Normal random variables is also Normal. Let $X_{1}\sim N(\mu_{1},\sigma^{2}_{1})$ and $X_{2}\sim N(\mu_{2},\sigma^{2}_{2})$ be two independent random variables, then

\displaystyle Y=X_{1}+X_{2}\sim N(\mu_{1}+\mu_{2},\sigma^{2}_{1}+\sigma^{2}_{2% }).

Proof.

$M_{X_{1}}(t)=e^{\mu_{1}t+\sigma^{2}_{1}t^{2}/2}$ and $M_{X_{2}}(t)=e^{\mu_{2}t+\sigma^{2}t^{2}/2}$ so using mgf property 4

\displaystyle M_{Y}(t)=M_{X_{1}}(t)M_{X_{2}}(t)=e^{\mu_{1}t+\sigma^{2}_{1}t^{2% }/2}\times e^{\mu_{2}t+\sigma^{2}_{2}t^{2}/2}=e^{(\mu_{1}+\mu_{2})t+(\sigma^{2% }_{1}+\sigma^{2}_{2})t^{2}/2},

which is the mgf of a $N(\mu_{1}+\mu_{2},\sigma^{2}_{1}+\sigma^{2}_{2})$ random variable. The result follows from mgf property 1. ∎

This is called the convolution property of the Normal distribution. There are several proofs of it; the above is the simplest and starts to show the power of mgfs.

Example 6.4.4.

(Exam2016) For some $\beta>0$ , let $V=1/\beta$ with probability $1$ , $W_{i}\sim\operatorname{\mathsf{Exp}}(\beta)(i=1,2,\dots)$ and $X\sim\operatorname{\mathsf{N}}(0,1)$ be independent of each other. Let $Y=X\sqrt{W_{1}}$ , $Z=W_{1}-W_{2}$ and $\overline{W}_{n}=\frac{1}{n}\sum_{i=1}^{n}W_{i}$ . You may take as given that the moment generating function (mgf) of $X$ is $M_{X}(t)=\operatorname{\mathsf{E}}\left[{e^{Xt}}\right]=e^{t^{2}/2}$ .

(a)

Find the mgf of $V$ , $M_{V}(t)$ .

Solution. Since $V=1/\beta$ , $\operatorname{\mathsf{E}}\left[{e^{Vt}}\right]={\color[rgb]{0.76,0.01,0}e^{t/% \beta}.}$
(b)

Find the mgf of $W_{1}$ , $M_{W_{1}}(t)$ . Be sure to specify the range of $t$ and make clear why this range applies.

Solution. This is $\beta/(\beta-t)$ (provided $t<\beta$ ); see Example 6.4.1 for detail and reason.
(c)

Show that, subject to the same range condition on $t$ , $M_{Z}(t)=\frac{1}{1-t^{2}/\beta^{2}}$ .

Solution.

$\displaystyle M_{Z}(t)=\operatorname{\mathsf{E}}\left[{e^{(W_{1}-W_{2})t}}% \right]={\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{E}}\left[{e^{W_{1}t}}% \right]\operatorname{\mathsf{E}}\left[{e^{-W_{2}t}}\right]=M_{W}(t)M_{W}(-t)=% \frac{1}{1-t/\beta}\times\frac{1}{1+t/\beta},}$

which gives the required result.
(d)
1. (i)
  
  Find the mgf of $\overline{W}_{n}$ ; what (if any) condition on the range of $t$ applies?
2. (ii)
  
  Find $\lim_{n\rightarrow\infty}M_{\overline{W}_{n}}(t)$ and interpret the result heuristically with reference to your answer to an earlier part of this question. (Hint: recall that $\lim_{n\rightarrow\infty}(1-x/n)^{n}=e^{-x}$ .)
Solution.
1. (i)
  
  $\displaystyle\operatorname{\mathsf{E}}\left[{e^{\overline{W}_{n}t}}\right]$ $\displaystyle=\operatorname{\mathsf{E}}\left[{e^{\frac{t}{n}\sum_{i=1}^{n}W_{i% }}}\right]$
  
  $\displaystyle={\color[rgb]{0.76,0.01,0}\prod_{i=1}^{n}\operatorname{\mathsf{E}% }\left[{e^{\frac{t}{n}W_{i}}}\right]}$
  
  $\displaystyle={\color[rgb]{0.76,0.01,0}M_{W}(t/n)^{n}}$
  
  $\displaystyle={\color[rgb]{0.76,0.01,0}\frac{1}{\left(1-t/(\beta n)\right)^{n}}}$
  
  Need ${\color[rgb]{0.76,0.01,0}t/n<\beta.}$
2. (ii)
  
  For any $t$ , for large enough $n$ , $t<n\beta$ . So,
  
  $\displaystyle\lim_{n\rightarrow\infty}M_{\overline{W}_{n}}(t)=\lim_{n% \rightarrow\infty}\frac{1}{\left(1-\frac{t}{\beta n}\right)^{n}}={\color[rgb]{% 0.76,0.01,0}\frac{1}{e^{-t/\beta}}=e^{t/\beta}}.$
  
  This is the mgf of $V$ , so as $n\rightarrow\infty,\overline{W}\rightarrow{\color[rgb]{0.76,0.01,0}\frac{1}{% \beta}}$ (in some sense).
(e)

Find the mgf of $Y$ and interpret the result with reference to your answer to an earlier part of this question. (Hint: use the tower property of expectations: $\operatorname{\mathsf{E}}\left[{g(X,W)}\right]=\operatorname{\mathsf{E}}\left[% {\operatorname{\mathsf{E}}\left[{g(X,W)|W}\right]}\right]$ .)

Solution.

$\displaystyle\operatorname{\mathsf{E}}\left[{e^{Yt}}\right]$ $\displaystyle=\operatorname{\mathsf{E}}\left[{e^{X\sqrt{W}t}}\right]={\color[% rgb]{0.76,0.01,0}\operatorname{\mathsf{E}}\left[{\operatorname{\mathsf{E}}% \left[{e^{X\sqrt{W}t}|W}\right]}\right]}$

$\displaystyle={\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{E}}\left[{e^{% \frac{1}{2}Wt^{2}}}\right]}$

$\displaystyle={\color[rgb]{0.76,0.01,0}\frac{1}{1-\frac{t^{2}}{2\beta}}}.$

This is like ${\color[rgb]{0.76,0.01,0}M_{Z}(t)}$ but with $\beta^{2}$ replaced by ${\color[rgb]{0.76,0.01,0}2\beta}$ . So $Y$ has the same distribution as the difference between two ${\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{Exp}}(\sqrt{2\beta})}$ random variables. Or, equivalently, the difference between two ${\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{Exp}}(\beta)}$ random variables has the same distribution as the product of a ${\color[rgb]{0.76,0.01,0}N(0,1)}$ and the square-root of an ${\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{Exp}}(\beta^{2}/2).}$

	$\displaystyle\operatorname{\mathsf{E}}\left[{g(X)h(Y)\|X}\right]$	$\displaystyle=\int_{t=-\infty}^{\infty}g(X)h(t)f_{Y\|X}(t\|X)\,\mathrm{d}t$
		$\displaystyle={\color[rgb]{0.76,0.01,0}g(X)\int_{t=-\infty}^{\infty}h(t)f_{Y\|X% }(t\|X)\,\mathrm{d}t}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}g(X)\operatorname{\mathsf{E}}\left[{h(Y% )\|X}\right].}$

	$\displaystyle f_{Y}(y)$	$\displaystyle=\int_{s=0}^{\infty}\exp(-s/y)\exp(-y)/y\,\mathrm{d}s$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\left[-\exp(-s/y-y)\right]_{s=0}^{% \infty}}$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\exp(-y).}$

	$\displaystyle M_{X+Y}(t)$	$\displaystyle=\operatorname{\mathsf{E}}\left[{e^{(X+Y)t}}\right]$
		$\displaystyle=\operatorname{\mathsf{E}}\left[{e^{Xt}e^{Yt}}\right]$
		$\displaystyle={\color[rgb]{0.76,0.01,0}\operatorname{\mathsf{E}}\left[{e^{Xt}}% \right]\operatorname{\mathsf{E}}\left[{e^{Yt}}\right]}$