Home page for accesible maths 5 Bivariate Distributions

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

5.5 Marginal Distributions

Given the joint distribution of (X,Y) we may want to find the (marginal) distribution of X or Y alone. The marginal distribution tells us about the behaviour of one random variable alone, i.e. irrespective of the other. We have been studying such distributions in the earlier chapters on univariate variables.

If we have the cdf FXY the marginal cdfs are obtained as follows:

  1. FX(x)=𝖯(Xx)=𝖯(Xx,Y<)=FXY(x,),

  2. FY(y)=𝖯(Yy)=𝖯(X<,Yy)=FXY(,y).

because the marginal event {Xx} is the same as the joint event {Xx,Y<} and the event {Yy} is the same as the event {X<,Yy}, as illustrated on Figure Figure 5.2 (First Link, Second Link).

Figure 5.2: First Link, Second Link, Caption: Left: The event {Xx}. Right: The event {Yy}.

Just as, for a discrete RV, the the marginal pmfs are obtained by summing over the other variable, so, for a continuous RV, the marginal pdfs are obtained by integrating over the other variable.

Theorem 5.5.1.

If X and Y are continuous random variables their marginal pdfs are

fX(x)=-fXY(x,t)dt,fY(y)=-fXY(s,y)ds.
Proof.

: For continuous random variables X and Y we have

FX(x)=FXY(x,)=-x-fXY(s,t)dtds=-x{-fXY(s,t)dt}ds,

and by differentiating both sides wrt. x we get

fX(x)=-fXY(x,t)dt.

Similarly for Y.

Example 5.5.1.

Find the marginal pdf of X for the joint distribution given in Example 5.5.2.

Solution.  pdf: The joint pdf is

fXY(x,y)={(x+y)/80<x<2, 0<y<20otherwise

Hence

fX(x) =18y=02x+tdt
=18[xt+t2/2]t=02
=18(2x+2)=x+14

for 0<x<2.
cdf: FX,Y(x,y)=(x2y+y2x)/16 for 0<x<2 and 0<y<2, so

FX(x)=FXY(x,)=FX,Y(x,2)=x2+2x8.

Differentiating gives fX(x)=(x+1)/4(0<x<2).

Example 5.5.2.

The random variables (X,Y) have joint distribution function

FXY(x,y)=xy{1+α(1-x)(1-y)}

for 0<x<1, 0<y<1, for -1α1. Find the marginal distributions of X and Y and identify their forms.

Solution.  Since Y<1,

FX(x)=FXY(x,)=FXY(x,1)=x

for 0<x<1. By symmetry FY(y)=y for 0<y<1. Both marginal distributions are therefore 𝖴𝗇𝗂𝖿(0,1).

Example 5.5.3.

The random variables (X,Y) have joint pdf

fXY(x,y)={βϕexp(-βx)exp(-ϕy)x>0,y>00otherwise

for β>0 and ϕ>0. Find the marginal distributions of X and Y and identify their forms.

Solution. 

fX(x) =t=-fXY(x,t)dt
=t=0βϕexp(-βx)exp(-ϕt)dt
=βexp(-βx)t=0ϕexp(-ϕt)dt
=βexp(-βx)

for x>0. (since ϕexp(-ϕt) is a density on t>0), giving X𝖤𝗑𝗉(β). By symmetry Y𝖤𝗑𝗉(ϕ).

5.6 Independence

Recall that, formally, we say that two random variables X and Y are independent if the events {XA} and {YB} are independent for all sets A and B, i.e.

𝖯(XA,YB)=𝖯(XA)𝖯(YB)

for all sets A and B.

We have seen that when X and Y are both discrete, they are independent if and only if their joint pmf can be factorised as a product of the marginal pmfs.

pXY(x,y)=pX(x)pY(y).

Similarly, when X and Y are both continuous they are independent if and only if their joint pdf can be factorised as a product of the marginal pdfs.

Theorem 5.6.1.

Two continuous random variables X and Y are independent if and only if

fXY(x,y)=fX(x)fY(y).
Proof.

: If X and Y are independent then whatever the values of x and y, take Ax={s:sx} and By={t:ty}. Then

FXY(x,y)=𝖯(XA,YB)=𝖯(XA)𝖯(YB)=FX(x)FY(y).

This is true for all x,y, and so we may differentiate both sides wrt. x and y to obtain

fXY(x,y)=fX(x)fY(y).

: If the joint pdf factorises we get for arbitrary sets A and B,

𝖯(XA,YB) =sAtBfXY(s,t)dtds
=A(BfX(s)fY(t)dt)ds
=AfX(s)(BfY(t)𝑑t)ds
=𝖯(XA)𝖯(YB).

Factorisation To check for independence: if we have the joint pdf (or pmf) it is enough to check that it can be factorised as a function of x times a function of y:

fXY(x,y)=g(x)h(y),

and that the range of X does not depend on Y (see CW question). We do not have to show that the functions g and h are themselves densities. Also if the range of X does not depend on Y then the range of Y does not depend on X, so we only need to check one of the two possibilities.

If the range of X does not depend on Y (and vice versa) we say that X and Y are variationally independent.

Example 5.6.1.

The figure below illustrates the joint density

fXY(x,y)=1|A|1A(x,y),

where the function 1A(x,y) is one when (x,y)A and zero otherwise, for four different regions A. In which cases are X and Y independent?

Unnumbered Figure: First link, Second Link

Unnumbered Figure: First link, Second Link

Solution.  TL: ind, TR: not ind, BL: not ind., BR: ind.

Given a joint pdf, a standard way to prove independence is to show factorisation and variational independence. To disprove independence, a counterexample to either suffices. This is straightforward for variational independence, but disproving factorisation is less obvious. The following method is recommended. An alternative is to show that a conditional distribution is not the same as a marginal distribution, but that usually involves more work.

Two point method: Note that fXY can be factorised as a function of x times a function of y if and only if for all x1, x2, y1, y2:

fXY(x1,y1)fXY(x2,y2)=fXY(x1,y2)fXY(x2,y1)

Since, in the case of independence, both sides equal fX(x1)fY(y1)fX(x2)fY(y2).

This is particularly useful for proving that a given joint pdf fXY does not factorise as above. Simply find (x1,y1), and (x2,y2), such that the two sides above are different.

Example 5.6.2.

Are the following pairs of random variables independent?

  1. (a)

    fXY(x,y)=12xy(1-y) for 0<x<1,0<y<1,

  2. (b)

    fXY(x,y)=2exp(-x-y) for 0<x<y<,

  3. (c)

    fXY(x,y)=x+y for 0<x<1,0<y<1.

Solution. 

  1. (a)

    Independent: variationally independent and fXY(x,y)=12x×y(1-y), so the joint density factorises.

  2. (b)

    Not independent: fXY=2e-x×e-y factorises BUT the range of X depends on Y.

  3. (c)

    Not independent: variationally independent BUT with x1=y1=1/3 and x2=y2=1/2 we have

    fXY(x1,y1)fXY(x2,y2)=2/3×1=2/325/36=5/6×5/6=fXY(x1,y2)fXY(x2,y1).

    Note: given variational independence, we first try to factorise x+y; when we cannot, we look for a counter-example.

Fewer sets A and B: as in the proof of Theorem 5.6.1, setting Ax={s:sx} and By={t:ty} shows that if X and Y are independent then for all x,y, FX,Y(x,y)=FX(x)FY(y). It turns out (we will not prove this) that for any pair of random variables, whether discrete, continuous or more complicated, ‘FX,Y(x,y)=FX(x)FY(y) for all x,y’ is equivalent to X and Y being independent (i.e. one need only consider a subset of the possible sets A and B).

Setting Ax={s:s>x} and By={t:t>y} shows that the independence of X and Y implies SX,Y(x,y)=SX(x)SY(y) for all x,y; again, it can be shown that independence is equivalent to the factorisation of the survivor functions.

Example 5.6.3.

Let X and Y be independent exponential random variables with parameters β and ϕ respectively. Find 𝖯(X>x,Y>y).

Solution.  By independence, 𝖯(X>x,Y>y)=𝖯(X>x)𝖯(Y>y) for 0<x, 0<y. So

𝖯(X>x,Y>y) =[1-FX(x)][1-FY(y)]
=exp(-βx)exp(-ϕy)=exp(-(βx+ϕy)).

5.7 Conditional Distributions

Suppose we know the joint distribution of (X,Y) but then we find out the value of one of the random variables. What can we say about the other random variable?

We consider the conditional distributions XY=y, i.e. the distribution of X given that Y=y, and YX=x, i.e. the distribution of Y given that X=x.

Recall that when X and Y were discrete random variables the conditional pmfs were:

pXY(xy)=pXY(x,y)pY(y),pYX(yx)=pXY(x,y)pX(x).

Similarly when X and Y are continuous random variables the conditional pdfs are

fXY(xy)=fXY(x,y)fY(y),fYX(yx)=fXY(x,y)fX(x).

Note that since we can only condition on possible values, we don’t have to worry about zero’s in the denominators: the marginal pmf/pdf has to be positive for the value to occur.

Also note that the conditional pdfs are themselves valid pdfs: they are non-negative and they integrate to 1. For instance,

s=-fXY(sy)ds =s=-fXY(s,y)fY(y)ds
=1fY(y)s=-fXY(s,y)ds
=1fY(y)fY(y)=1.

Similarly, conditional pmfs sum to 1.

When the variables (X,Y) are independent discrete RVs then for all x, y, recall that

  1. pXY(xy)=pX(x)pY(y)pY(y)=pX(x),

  2. pYX(yx)=pX(x)pY(y)pX(x)=pY(y).

Similarly if (X,Y) are independent continuous RVs then for all x, y,

  1. fXY(xy)=fX(x)fY(y)fY(y)=fX(x),

  2. fYX(yx)=fX(x)fY(y)fX(x)=fY(y).

These results conform with intuition as when X and Y are independent knowing the value of X should tell us nothing about Y and vice versa.

The converse is also true: If the conditional distribution of X given Y=y is independent of y or, equivalently, the conditional distribution of Y given X=x is independent of x, then X and Y are independent.

Example 5.7.1.

A piece of string of unit length is tied at one end to a hook. The string is cut at a (uniform) random distance X from the hook. The piece remaining is then cut again at a (uniform) random distance Y from the hook. Given that the remaining length tied to the hook has length y, find the pdf of the position of the first cut.

Unnumbered Figure: Link

Solution.  Model with X𝖴𝗇𝗂𝖿(0,1) and Y|X𝖴𝗇𝗂𝖿(0,x). We know fX(x)=1 for 0x1, and fYX(y|x)=1/x for 0<y<x<1.

Now fX|Y(x|y)=fXY(x,y)/fY(y).

We know fXY(x,y)=fX(x)fY|X(y|x)=1/x(0<y<x<1) and 0 otherwise.

So we need fY(y).

fY(y) =s=y1fXY(s,y)ds
=s=y11sds
=[logs]s=y1
=-log(y)

for 0<y<1. Hence fX|Y(x|y)=-1/(xlog(y)),y<x<1.

Example 5.7.2.

Continuous random variables X and Y have joint pdf

fXY(x,y)={exp(-x/y)exp(-y)/y0<x<, 0<y<0otherwise

Find

  1. 1.

    the conditional pdf of X given Y=y,

  2. 2.

    𝖯(X>1Y=1).

Solution. 

  1. 1.

    Since fX|Y(x|y)=fXY(x,y)/fY(y) we need the marginal pdf fY(y).

    fY(y) =s=0exp(-s/y)exp(-y)/yds
    =[-exp(-s/y-y)]s=0
    =exp(-y).

    for y>0. Hence for x>0

    fX|Y(x|y)=exp(-x/y)exp(-y)/yexp(-y)=exp(-x/y)/y.
  2. 2.

    When Y=1 we have fX|Y(x|Y=1)=exp(-x), so

    𝖯(X>1|Y=1)=1exp(-s)ds=[-exp(-s)]1=e-1.

5.8 Key definitions and Relationships

Let (X,Y) be a bivariate rv.

  1. 1.

    The joint cdf is FX,Y(x,y)=𝖯(Xx,Yy). FX(x)=FX,Y(x,).

  2. 2.

    For a discrete rv, the joint pmf is pX,Y(x,y)=𝖯(X=x,Y=y).

  3. 3.

    For a continuous rv, the joint pdf is fX,Y(x,y)=2xyFX,Y(x,y).

  4. 4.

    For discrete rvs X and Y, the marginal pmf of X, is pX(x)=j=-pX,Y(x,j), and the conditional pmf of X given Y=y is pX|Y(x|y)=pX,Y(x,y)/pY(y).

  5. 5.

    For continuous rvs X and Y, the marginal pdf of X is fX(x)=t=-fX,Y(x,t)dt, and the conditional pdf of X given Y=y is fX|Y(x|y)=fX,Y(x,y)/fY(y).

  6. 6.

    X and Y are independent if and only if the events {XA} and {YB} are independent for all sets A and B: 𝖯(XA,YB)=𝖯(XA)𝖯(YB) for all A, B.

  7. 7.

    An equivalent, but easier to check, condition for independence (of discrete or continuous rvs) is: FX,Y(x,y)=FX(x)FY(y). For discrete rvs, independence is also equivalent to pX,Y(x,y)=pX(x)pY(y), whereas for continuous rvs it is equivalent to fX,Y(x,y)=fX(x)fY(y). When just checking factorisation within the range where the rvs are non-zero, variational independence must also be verified.

  8. 8.

    Lack of independence can be shown using the two-point method; showing that fX,Y(x1,y1)fX,Y(x2,y2)fX,Y(x1,y2)fX,Y(x2,y1) for some x1,x2,y1,y2. Alternatively, show that fX|Y(x|y)fX(x) for some y.

Chapter 6 Expectation (II)

We have already encountered the expectation E(X) and variance V(X) of a univariate random variable, X. In this chapter we examine the corresponding measures for multivariate random variables. We also investigate a particularly useful expectation: the moment generating function.

6.1 Bivariate Expectations

We know how to obtain expectations for univariate random variables. The definition extends easily to bivariate random variables. The expectation of any function g(X,Y) is given by:

Discrete random variables
𝖤[g(X,Y)]=s=-t=-g(s,t)pXY(s,t),
Continuous random variables
𝖤[g(X,Y)]=s=-t=-g(s,t)fXY(s,t)dtds.

In the rest of this section results are given for the continuous random variable case only, however these extend immediately to discrete random variables.

Moments of either variable alone can be obtained from the joint distribution or from the relevant marginal.

𝖤[X]=s=-t-sfXY(s,t)dtds=-s{-fXY(s,t)dt}ds=-sfX(s)ds,

and, more generally, for a function g,

𝖤[g(X)]=s=-t=-g(s)fXY(s,t)dtds=-g(s)fX(s)ds.

Similarly for Y and any function h (including h(Y)=Y),

𝖤[h(Y)]=t=-s=-h(t)fXY(s,t)dsdt=-h(t)fY(t)dt.

Using linearity of integrals we also have for any functions g and h

𝖤[g(X)+h(Y)] =--[g(s)+h(t)]fXY(s,t)dtds
=--g(s)fXY(s,t)dtds+--h(t)fXY(s,t)dtds
=𝖤[g(X)]+𝖤[h(Y)].

In particular

𝖤[X+Y]=𝖤[X]+𝖤[Y],

regardless of the joint distribution of (X,Y).

If X and Y are independent we also have for any functions g and h

𝖤[g(X)h(Y)] =s=-t=-g(s)h(t)fXY(s,t)dtds
=s=-t=-g(s)h(t)fX(s)fY(t)dtds
=s=-g(s)fX(s){t=-h(t)fY(t)dt}ds
={-g(s)fX(s)ds}{-h(t)fY(t)dt}
=𝖤[g(X)]𝖤[h(Y)].

In particular, if X and Y are independent, then

𝖤[XY]=𝖤[X]𝖤[Y].

Firstly we note that for dependent random variables 𝖤[XY]𝖤[X]𝖤[Y], in general. For example, setting Y=X gives

𝖤[XY]=𝖤[X2]𝖤[X]2=𝖤[X]𝖤[Y],

the difference between the two being 𝖵𝖺𝗋[X].

More subtly, even when 𝖤[XY]=𝖤[X]𝖤[Y], X and Y need not be independent.

Example 6.1.1.

Let XN(0,1) and Y=X2-1. Find 𝖤[XY] and 𝖤[X]𝖤[Y].

Solution.  𝖤[X]=0, so 𝖤[X]𝖤[Y]=0. Also

𝖤[XY]=𝖤[X3-X]=𝖤[X3]-𝖤[X]=0-0=0,

since 𝖤[Xr]=0 for r an odd integer. So 𝖤[XY]=𝖤[X]𝖤[Y]=0.

The joint distribution of (X,Y) is illustrated on Figure 6.1. Clearly the variables X and Y are strongly related, as given X we know Y exactly.

Figure 6.1: Link, Caption: A 1000 realisations of (X,Y), where XN(0,1) and Y=X2-1. X and Y are uncorrelated (ρ=0) but not independent.
Example 6.1.2.

Find the expected value of X-Y if 𝖤[X]=𝖤[Y]. Does this result depend on other features of the joint distribution of (X,Y)?

Solution.  𝖤[X-Y]=𝖤[X]+𝖤[-Y]=𝖤[X]-𝖤[Y]=0. No other assumptions are needed.

Example 6.1.3.

The random variables (X,Y) have joint pdf

fXY(x,y)={1/20<x<y, 0<y<20otherwise

Find 𝖤[X], 𝖤[Y] and 𝖤[XY]. Does 𝖤[XY]=𝖤[X]𝖤[Y]?

Solution. 

Unnumbered Figure: Link

𝖤[X] =s=02t=s2s12dtds
=12s=022s-s2ds
=12[s2-s33]02
=2/3
𝖤[Y] =s=02t=s2t12dtds
=12s=0222-s22ds
=12[2s-s36]02
=4/3.
𝖤[XY] =s=02t=s2st12dtds
=s=02[st24]t=s2ds
=s=02s-s34ds
=[s22-s416]s=02
=1𝖤[X]𝖤[Y]=8/9.

6.2 Conditional Expectations

Expectations for conditional random variables are defined in the obvious way. Conditional expectations are given by

  1. 𝖤[XY=y]=-sfXY(sy)ds,

  2. 𝖤[YX=x]=-tfYX(tx)dt.

𝖤[YX=x] is a function g(x), say, of x (a real number). If we have not yet seen x then this becomes a function g(X) of the random variable X. i.e. 𝖤[YX] is a random variable because it is a function of the random variable X.

Sometimes conditioning provides an easy way to obtain the expectations of the marginal variables. Consider the random variable 𝖤[h(Y)|X], which is a function of X. Just as 𝖤[g(X)]=g(s)fX(s)ds, so the expectation of 𝖤[h(Y)|X] is

𝖤[𝖤[h(Y)|X]] =-𝖤[h(Y)X=s]fX(s)ds
=s=-t=-h(t)fYX(ts)dtfX(s)ds
=--h(t)fXY(s,t)dsdt
=𝖤[h(Y)].

Now consider 𝖤[g(X)h(Y)|X], which is a random variable, since it is a function of the random variable X.

𝖤[g(X)h(Y)|X] =t=-g(X)h(t)fY|X(t|X)dt
=g(X)t=-h(t)fY|X(t|X)dt
=g(X)𝖤[h(Y)|X].

Intuitively, by conditioning on the unknown X it becomes an unknown constant as far as the expectation is concerned and so it can be taken outside the expectation.

Example 6.2.1.

The rvs X and Y follow a distribution specified by X𝖭(0,1) and YX=x𝖭(αx,1).

  1. (a)

    Write down 𝖤[Y|X=x] and 𝖵𝖺𝗋[Y|X=x].

  2. (b)

    Find 𝖤[X] and 𝖤[Y].

  3. (c)

    Find 𝖤[XY].

Solution. 

  1. (a)

    𝖤[Y|X=x]=αx and 𝖵𝖺𝗋[Y|X=x]=1.

  2. (b)

    𝖤[X]=0 and

    𝖤[Y] =𝖤[𝖤[Y|X]]
    =𝖤[αX]=α𝖤[X]=0
  3. (c)
    𝖤[XY] =𝖤[𝖤[XY|X]]
    =𝖤[X𝖤[Y|X]]
    =𝖤[αX2]
    =α.

    Note that 𝖤[XY]-𝖤[X]𝖤[Y]=α.

The conditional variances are given by

𝖵𝖺𝗋[XY=y] =-(s-𝖤[XY=y])2fXY(sy)ds
=𝖤[X2Y=y]-𝖤[XY=y]2,
𝖵𝖺𝗋[YX=x] =-(t-𝖤[YX=x])2fYX(tx)dt
=𝖤[Y2X=x]-𝖤[YX=x]2.

If X and Y are independent the conditional distributions are the same as the marginal distributions (fX|Y(x|y)=fX(x) and fY|X(y|x)=fY(y)), so that in particular

  1. 𝖤[XY=y]=𝖤[X],

  2. 𝖵𝖺𝗋[XY=y]=𝖵𝖺𝗋[X],

  3. 𝖤[YX=x]=𝖤[Y],

  4. 𝖵𝖺𝗋[YX=x]=𝖵𝖺𝗋[Y].

6.3 Decomposition of the marginal variance

We have seen that the marginal expectations can be obtained from the conditional expectations. We can also obtain the marginal variances from the conditional expectations and variances by the following formula:

𝖤[𝖵𝖺𝗋[YX]]+𝖵𝖺𝗋[𝖤[YX]]
=𝖤[𝖤[Y2X]-𝖤[YX]2]+𝖤[𝖤[YX]2]-𝖤[𝖤[YX]]2
=𝖤[Y2]-𝖤[Y]2
=𝖵𝖺𝗋[Y].

These formulae are particularly useful when a random variable Y is given as a mixture of distributions. This is most easily illustrated by an example.

Example 6.3.1.

Let X be a Poisson(λ) random variable, and given X takes the value x let Y be Binomial(x,p)-distributed, i.e. YX=x Binomial(x,p). Find the expectation and variance of Y.

Solution.  From properties of the Binomial distribution we have

  1. 𝖤[YX=x]=xp,

  2. 𝖵𝖺𝗋[YX=x]=xp(1-p).

Hence, using properties of the Poisson distribution we obtain

𝖤[Y]=𝖤[𝖤[YX]]=𝖤[Xp]=λp.
𝖵𝖺𝗋[Y] =𝖤[𝖵𝖺𝗋[YX]]+𝖵𝖺𝗋[𝖤[YX]]
=𝖤[Xp(1-p)]+𝖵𝖺𝗋[Xp]
=λp(1-p)+λp2
=λp.

In fact, it can be shown that YPoisson(λp).

6.4 Moment generating functions

The moment generating function or mgf of a random variable X is defined through

MX(t)=𝖤[etX]={ietipX(i)if X is discrete rv with pmf pX(x)setsfX(s)dsif X is continuous rv with pdf fX(s)

for all real values of t for which the expectation exists.

Moment generating functions can be manipulated in many ways to reveal properties of the underlying probability distributions. They often help in mathematical proofs of probability theorems, and will be used for this purpose in Chapter 9.

Example 6.4.1.

Find the mgf of the random variable following the exponential distribution with parameter β; sketch the mgf when β=4.

Solution.  X𝖤𝗑𝗉(λ)fX(x)=βe-βx, for x>0. Hence,

MX(t) =0etxβe-βxdx=λ0e-x(β-t)dx
=ββ-t

for β>t. Note that MX(t) is only defined for β>t, since only in that case does the integral exist. Hence, for β=4 the mgf looks like:

Unnumbered Figure: Link

Quiz: Now consider a general rv: can the mgf be negative? No; it is the expectation of a non-negative quantity.

Theorem 6.4.1.

If mgf is defined in some neighbourhood of the origin, |t|<t0, the following properties are satisfied:

  1. 1.

    The mgf determines uniquely the distribution of the rv X. That is, if two rvs have the same mgf then they have the same cdf.

  2. 2.

    If Z=a+bX, for a real and b non-zero real number, MZ(t)=eatMX(bt).

  3. 3.

    Moments about the origin can be obtained by differentiating the mgf with respect to t and then evaluating the mgf at zero, i.e.

    MX(0)=𝖤[X0]=1;M(0)=𝖤[X];M′′(0)=𝖤[X2]

    Hence the name!

  4. 4.

    Let X,Y be independent rvs with mgf MX(t),MY(t) respectively. Then,

    MX+Y(t)=MX(t)MY(t).
Proof.
  1. 1.

    Proof uses ideas from complex analysis (see Math215).

  2. 2.

    If Z=a+bX, then

    MZ(t)=Ma+bX(t)=𝖤[e(a+bX)t]=eat𝖤[ebXt]=eatMX(bt).
  3. 3.

    Since M=𝖤[etX] then M(t)=𝖤[XetX], M′′(t)=𝖤[X2etX] etc.; but e0X=1 so

    1. MX(0)=1,

    2. MX(0)=E(X),

    3. MX′′(0)=E(X2),

    and so on.

  4. 4.
    MX+Y(t) =𝖤[e(X+Y)t]
    =𝖤[eXteYt]
    =𝖤[eXt]𝖤[eYt]

by independence, so MX+Y(t)=MX(t)MY(t). ∎

From Part 4, by induction, if X1,X2,,Xn are independent random variables:

MX1+X2++Xn(t)=MX1(t)MX2(t)MXn(t).
Example 6.4.2.

Using its mgf, find the expectation and the variance of the random variable following the exponential distribution with parameter λ.

Solution.  Consider the first two derivatives of the mgf:

M(t)=β(β-t)2, M′′(t)=2β(β-t)3.

Hence, 𝖤[X]=M(0)=1β, 𝖤[X2]=M′′(0)=2β2 and

𝖵𝖺𝗋[X]=𝖤[X2]-𝖤[X]2=1β2.

The mgf of a Normal random variable We first consider ZN(0,1). Then

MZ(t) =𝖤[eZt]=-ezt12πe-z22dz
=12π-e-z2-2zt2dz
=et2/212π-e-(z-t)22dz

by completing the squares. Hence MZ(t)=et2/2 by unit integrability of the N(t,1) density.

So if V=μ+σZ then by Property 2,

Mv(t)=etμMZ(tσ)=eμt+12σ2t2.

For instance, if ZN(0,1) then

  1. MZ(t)=et2/2,

  2. MZ(t)=tet2/2,

  3. MZ′′(t)=t2et2/2+et2/2,

  4. MZ′′′(t)=t3et2/2+3tet2/2,

  5. MZiv(t)=t4et2/2+6t2et2/2+3et2/2.

In particular MZ′′(0)=1,MZiv(0)=3 so 𝖤[Z2]=1 and 𝖤[Z4]=3 as mentioned in Chapter 3.

Unfortunately the mgf is not defined for some rvs.

Example 6.4.3.

Let X𝖢𝖺𝗎𝖼𝗁𝗒, then

Mx(t)=-etxπ(1+x2)dx,

which is not defined as, if t>0 the integrand as x and if t<0 the integrand as x-.

Theorem 6.4.2.

The sum of two independent Normal random variables is also Normal. Let X1N(μ1,σ12) and X2N(μ2,σ22) be two independent random variables, then

Y=X1+X2N(μ1+μ2,σ12+σ22).
Proof.

MX1(t)=eμ1t+σ12t2/2 and MX2(t)=eμ2t+σ2t2/2 so using mgf property 4

MY(t)=MX1(t)MX2(t)=eμ1t+σ12t2/2×eμ2t+σ22t2/2=e(μ1+μ2)t+(σ12+σ22)t2/2,

which is the mgf of a N(μ1+μ2,σ12+σ22) random variable. The result follows from mgf property 1. ∎

This is called the convolution property of the Normal distribution. There are several proofs of it; the above is the simplest and starts to show the power of mgfs.

Example 6.4.4.

(Exam2016) For some β>0, let V=1/β with probability 1, Wi𝖤𝗑𝗉(β)(i=1,2,) and X𝖭(0,1) be independent of each other. Let Y=XW1, Z=W1-W2 and W¯n=1ni=1nWi. You may take as given that the moment generating function (mgf) of X is MX(t)=𝖤[eXt]=et2/2.

  1. (a)

    Find the mgf of V, MV(t).

    Solution.  Since V=1/β, 𝖤[eVt]=et/β.

  2. (b)

    Find the mgf of W1, MW1(t). Be sure to specify the range of t and make clear why this range applies.

    Solution.  This is β/(β-t) (provided t<β); see Example 6.4.1 for detail and reason.

  3. (c)

    Show that, subject to the same range condition on t, MZ(t)=11-t2/β2.

    Solution. 

    MZ(t)=𝖤[e(W1-W2)t]=𝖤[eW1t]𝖤[e-W2t]=MW(t)MW(-t)=11-t/β×11+t/β,

    which gives the required result.

  4. (d)
    1. (i)

      Find the mgf of W¯n; what (if any) condition on the range of t applies?

    2. (ii)

      Find limnMW¯n(t) and interpret the result heuristically with reference to your answer to an earlier part of this question. (Hint: recall that limn(1-x/n)n=e-x.)

    Solution. 

    1. (i)
      𝖤[eW¯nt] =𝖤[etni=1nWi]
      =i=1n𝖤[etnWi]
      =MW(t/n)n
      =1(1-t/(βn))n

      Need t/n<β.

    2. (ii)

      For any t, for large enough n, t<nβ. So,

      limnMW¯n(t)=limn1(1-tβn)n=1e-t/β=et/β.

      This is the mgf of V, so as n,W¯1β (in some sense).

  5. (e)

    Find the mgf of Y and interpret the result with reference to your answer to an earlier part of this question. (Hint: use the tower property of expectations: 𝖤[g(X,W)]=𝖤[𝖤[g(X,W)|W]].)

    Solution. 

    𝖤[eYt] =𝖤[eXWt]=𝖤[𝖤[eXWt|W]]
    =𝖤[e12Wt2]
    =11-t22β.

    This is like MZ(t) but with β2 replaced by 2β. So Y has the same distribution as the difference between two 𝖤𝗑𝗉(2β) random variables. Or, equivalently, the difference between two 𝖤𝗑𝗉(β) random variables has the same distribution as the product of a N(0,1) and the square-root of an 𝖤𝗑𝗉(β2/2).