4 Bayesian statistics 331 Week 4 Multi-parameter models

4.2 The Normal distribution

The Normal distribution

The normal distribution is a commonly encountered distribution (because of the central limit theorem) and therefore important. Bayesian inference on the normal becomes a little more difficult because there are at least two unknowns rather than one. There are a variety of ways of carrying Bayesian inference on these two parameters and the method depends on the priors being used. Let Y1,,Yn be a set of independent variables from Let Y1,,Yn be a set of independent variables from Normal (μ,1τ), where both the mean and variance are denoted by μ and 1/τ respectively. Then,

f(yi|μ,τ)=τ1/22πexp[-τ(yi-μ)22] (4.2)

so

f(𝐲|μ,τ)τn/2exp[-τi=1n(yi-μ)22]. (4.3)

The Normal distribution

The two parameter exponential family

T(y)=[yy2] and η(θ)=[τμ-τ/2] and therefore will have fully conjugate priors.

4.2.1 The normal distribution with improper priors

The normal distribution with improper priors

Again assuming i.i.d. Gaussian likelihood which can be expressed as

Yiμ,τ Normal (μ,1τ)i=1,2,,n

Also let the sample mean sum of the squares be respectively be y¯=i=1nyin and S2=i=1n(yi-y¯)2. The likelihood can then be expressed as

f(y1:nμ,τ) τn2exp{-τ2i=1n(yi-μ)2}
τn2exp{-τ2i=1n(yi-y¯+y¯-μ)2}
τn2exp{-τ2[i=1n(yi-y¯)2+n(y¯-μ)2]}
τn-12exp{-12S2τ}τ12exp{-n2τ(μ-y¯)2}

The normal distribution with improper priors

If we choose an improper prior of π(μ,τ)1τ, the joint distribution becomes

π(μ,τy) π(τ,μ)f(y1:nμ,τ)
1ττn-12exp{-12S2τ}τ12exp{-n2τ(μ-y¯)2}
τn-12-1exp{-12S2τ}Gamma (n-12,S22)τ12exp{-n2τ(μ-y¯)2}Normal (y¯,1nτ)

The joint posterior π(μ,τy)=π(μτ,y)π(τy) is a product of these two distributions.

μτ,y Normal (y¯,1nτ)
τy Gamma (n-12,S22) (4.4)

The marginal posterior of μ

The marginal distribution of μ can be obtained by marginalisng τ from the joint distribution.

π(μy) =π(μ,τy)𝑑τ
τn2-1exp{-τ2[S2+n(μ-y¯)2]}𝑑τ

This is a kernel of a Gamma distribution so

π(μy) Γ(αp)βpαp
Γ(n2)([S2+n(μ-y¯)2])-n2

The marginal posterior of μ

Finally, consider a location and scale change to μ :

t=μ-x¯s/nwhere s2=S2n-1.

Then

π(t|x) 1{(n-1)s2+(st)2}n/2
{1+t2n-1}-n/2.

This is the density of a t–distribution with n-1 degrees of freedom. That is:

t|xtn-1.

The distribution of a future observation (The predictive)

To find the predictive distribution f(yn+1y1:n) a double integral must be carried out.

f(yn+1y1:n)=τμτπ(μ,τy1:n)f(yn+1μ,τ)𝑑μ𝑑τ

This is easily done by carrying out these two steps

  1. (1)

    Condition on τ and integrate out μ: using

    f(yn+1y,τ)=μτπ(μ,τy1:n)f(yn+1μ,τ)𝑑μ
  2. (2)

    And then marginalisng over τ.

    f(yn+1y)=τf(yn+1y,τ)π(τy)𝑑τ

    where π(τy) is given by Equation(4.4).

The distribution of a future observation (The predictive)

ntegral in (1) can be done easily using the identities for the marginal means and variances. The integral in (2) is straightforward. These integrals are not done here and are part of the coursework for this chapter. It can be shown that

yn+1ytn-1(y¯,s2(1+1n))

4.2.2 The fully conjugate Normal-gamma prior

Using the fully conjugate, four parameter Normal-gamma prior

Yiμ,τ Normal (μ,1τ)i=1,2,,n

Also let the sample mean and variance respectively be y¯=i=1nyi and s2=i=1n(y-y¯)2n. The likelihood can then be expressed as

f(y1:nμ,τ) τn2exp{-τ2i=1n(yi-μ)2}
τn2exp{-τ2i=1n(yi-y¯)2+n(y¯-μ)2}
τn-12exp{-12ns2τ}τ12exp{-n2τ(μ-y¯)2}

As a function of μ and τ this is a product of a kernel of a Normal (y¯,(nτ)-1) for μ and a Gamma (n+12,ns22) for τ.

The fully conjugate, four parameter Normal-gamma prior

This parametrization of the likelihood suggests the following steps.

  • Express the priors of μ and τ as a four parameter Normal-Gamma distribution

  • Rewrite the likelihood so it has the same form.

  • By using the four parameter Normal-Gamma form of the prior.

The fully conjugate, four parameter Normal-gamma prior

We will assume μ and τ have the hierarchical conjugate priors

τ Gamma (a0,b0) (4.5)
μτ Normal (μ0,(n0τ)-1) (4.6)

and will show that the conjugate posteriors using Equations (4.2.2) and (4.6) have the form

τy Gamma (an,bn) (4.7)
μτ,y Normal (μn,(nnτ)-1) (4.8)

The fully conjugate, four parameter Normal-gamma prior

Expressions can be developed for the four parameters μn,nn,an,bn. Equations (4.2.2) and (4.6) represent the Normal-Gamma family

π(μ,τ) =π(μτ)π(τ)
τa0-1exp{-b0τ}τ12exp{-τn02(μ-μ0)2}

By integrating over τ we can find the the marginal prior of μ π(μ)=τ=0π(μ,τ)𝑑τ and show that it has the form of the T distribution

μ(μ0,(n0a0b0)-1,2a0).

This leads to a posterior distribution of

μy(μn,(nnanbn)-1,2an).

The fully conjugate, four parameter Normal-gamma prior

The likelihood can be rewritten as

τn-12exp{-12S2τ}τ12exp{-n2τ(μ-y¯)2}

This suggests a prior of the form

π(μ,τy) =π(τy)π(μτ,y)
τa0-1exp{-b0τ}τ12exp{-τn02(μ-μ0)2}

where the updates to the sufficient statistics are

an = a0+n2
bn = (n-1)s22+b0+12n0nn0+n(μ0-y¯)2
μn = ny¯+n0μ0n0+n
nn = n0+n.

The predictive

In addition the predictive and the marginal are also available in closed from

  1. y*y(μn,bn(1+nn)nnan,2an) The predictive

  2. y(μ0,b0(1+n0)n0a0,2a0) The marginal

Derivation of four parameter updates

f(yμ,τ) τn-12exp{-12(n-1)s2τ}τ12exp{-n2τ(μ-y¯)2}
π(μ,τ) τa-1exp{-b0τ}τ12exp{-τn02(μ-μ0)2}
π(μ,τy1:n) τn2+a0-1exp{-12((n-1)s2+2b0)τ}
×τ12exp{-12τ(n(μ-y¯)2+n0(μ-μ0)2)}

By completing the square we can show that

π(μ,τy) τn2+a0-1exp{-12((n-1)s2+2b0)τ}
×τ12exp{-τ2nn0(μo-y¯)2n+n0+(n+n0)(μ-ny¯+noμon+n0)2}

Derivation of four parameter updates

It can be further shown that

π(μ,τy τn2+a0-1exp{-τ2((n-1)s2+2b0+n0n(μ0-y¯)2n0+n)}
. τ12exp{-12τ(n0+n)(μ-ny¯+noμon+n0)2}

By comparing with the prior

π(μ,τ) τa0-1exp{-b0τ}τ12exp{-τn02(μ-μ0)2}

we can show that

an = a0+n2
bn = (n-1)s22+b0+12n0nn0+n(μ0-y¯)2
μn = ny¯+n0μ0n0+n
nn = n0+n

Updates of the sufficient statistics for the fully conjugate model

μ and τ have the hierarchical prior structure:

τ Gamma (a0,b0)
μτ Normal (μ0,(n0τ)-1)
μ (μ0,b0n0a0,2a0) (4.9)

Multiplying by the likelihood

τn-12exp{-12S2τ}τ12exp{-n2τ(μ-y¯)2}

We get

τy Gamma (an,bn)
μτ,y Normal (μn,(nnτ)-1)
μy (μn,bnnnan,2an) (4.10)

Updates of the sufficient statistics for the fully conjugate model

where the updates to the sufficient statistics are

an = a0+n2
bn = (n-1)s22+b+12n0nn0+n(μ0-y¯)2
μn = ny¯+n0μ0n0+n
nn = n0+n

Sequential updates to the sufficient statistics

τiy1:i-1 Gamma (ai-1,bi-1)
μiτi,y1:i Normal (μi,(niτi)-1)
μy1:i (μi,biniai,2ai), (i=1,2,,n)

The updates are

ai ai-1+12
bi bi-1+12ni-11+ni-1(yi-μi-1)2errorsquared
μi μi-1+11+ni-1(yi-μi-1)error
ni ni-1+1, (i=1,2,,n)

Sequential Updates data and code

#Australian deaths
y=c(0.522, 0.425, 0.425, 0.477, 0.828, 0.616, 0.367, 0.431, 0.281,
0.465, 0.269, 0.578, 0.566, 0.508, 0.751, 0.681, 0.766, 0.456,
0.498, 0.419, 0.61, 0.457, 0.571, 0.348, 0.387, 0.582, 0.239,
0.237, 0.263, 0.424, 0.365, 0.375, 0.409, 0.389, 0.24, 0.159,
0.439, 0.509, 0.374, 0.434, 0.413, 0.329, 0.519, 0.549, 0.547,
0.496, 0.531, 0.596, 0.557, 0.573, 0.501, 0.543, 0.559, 0.691,
0.44, 0.568, 0.597, 0.474, 0.592, 0.598, 0.633, 0.606, 0.705,
0.481, 0.703, 0.701, 0.603, 0.698, 0.598, 0.802, 0.602, 0.599,
0.603, 0.702, 0.5, 0.498, 0.498, 0.6, 0.334, 0.274, 0.321, 0.541,
0.405, 0.289, 0.328, 0.313, 0.258, 0.214, 0.186, 0.159

Sequential Updates data and code

omega=.5  #The forgetting parameter.
i=0;m=1;n=1;a=2;b=1;
mn=rep(0,length(y)); uq=rep(0,length(y));lq=rep(0,length(y))
while (i<length(y))
{
  i=i+1
  a0=omega*a
  b0=omega*b
  m=m0+(1/(1+n0))*(y[i]-m0)
  n=n0+1
  a=omega*a0+.5
  b=omega*b0+(.5*n0)/(n0+1) *(y[i]-m0)^2

  mn[i]=qnorm(.5,m0,sqrt(b0/a0))
  uq[i]=qnorm(.95,m0,sqrt(b0/a0))
  lq[i]=qnorm(.05,m0,sqrt(b0/a0))

  m=m1;n=n1; a=a1; b=b1
}

Predicting ahead without forgetting

Figure 4.4: Link, Caption: Note without forgetting the precision of the predictions just increase

Predicting ahead with forgetting

Figure 4.5: Link, Caption: Here we allow forgetting so that posterior to prior precision is lost. This means that more weight is given to the current observation and less to the information stored in the prior.