3 Week 3: Bayesian statistics: Prediction

3.1 Examples

3.1.1 A binomial likelihood with a beta posterior

The predictive from a binomial likelihood with a beta prior

Suppose yBinomial (n,π) and our (conjugate) prior for π is πBeta (p,q). The posterior for π is given by:

π|yBeta (p+y,q+n-y).

We let y be the number of successes in those N trials, so that

f(y|π)=(Ny)πy(1-π)N-y.
f(y|y) = 01(Ny)πy(1-π)N-y×πp+y-1(1-π)q+n-y-1B(p+y,q+n-y)𝑑π
= (Ny)1B(p+y,q+n-y)01πp+y+y-1(1-π)N-y+q+n-y-1
= (Ny)B(y+p+y,N-y+q+n-y)B(p+y,q+n-y)

This is known as a Beta-binomial distribution. yBeta-Binomial (N,P,Q) where P=p+y and Q=q+n-y.

The predictive from a binomial likelihood with a beta prior

Figure 3.2: Link, Caption: The diagram compares the binomial and the beta-binomial distribution. The beta-binomial can have more variation than the binomial

Prior, likelihood, posterior and predictive

Conjugate Bayesian predicting with a Binomial likelihood

Yθ Binomial (n,θ),θ>0
θ Beta (p,q)
θy Beta (p+y,q+n-y)
y*θ Binomial (N,θ)
y*y Beta-Binomial (N,p+y,q+n-y)

3.1.2 A Poisson likelihood with a gamma posterior

A Poisson likelihood with a gamma prior

This is the probability of a future observation y given counts y1,y2,,yn have been observed. For the predictive, we integrate the likelihood over the posterior Gamma (P,Q), Note P=p+i=1nyi and Q=q+n.

f(y*y) = θ=0f(y*|θ)π(θy)𝑑θ
= θ=01y*!e-θθy*QPΓ(P)θP-1e-Qθ𝑑θ
= QPy*!Γ(P)θ=0θy*+P-1e-θ(Q+1)𝑑θ
= QPy*!Γ(P)Γ(y+P)(Q+1)y*+P
= QPy*!Γ(P)Γ(y*+P)(Q+1)y*+P
Γ(y*+P)y*!(QQ+1)P(1Q+1)y*
y*yNegative-Binomial (P,QQ+1)

A Poisson likelihood with a gamma posterior

Figure 3.3: Link, Caption: The diagram compares the Poisson and the negative-binomial distribution. The negative-binomial can have more variation than the Poisson

Prior, likelihood, posterior and predictive

Bayesian prediction with a Poisson likelihood

Yiθ Poisson (θ),θ>0,yi0,i=1,2,,n
θ Gamma (p,q)
θy Gamma (p+i=1nyi,q+n)
Y*θ Poisson (θ)
Y*y Negative-Binomial (p+i=1nyi,q+n-i=1nyi1+q+n-i=1nyi)

Identities useful for finding predictive variance

Identities for finding the marginal mean and variance of the predictive from the conditional means and variances

The following identities are useful particularly with the Normal distribution because the Normal distribution can be described completely by it mean and variance.

(y) λyλ (3.2)
Var(y) λVaryλ+Varλyλ. (3.3)

The predictive variance of predictive with Poisson likelihood

Example

Show that the mean and variance of the predictive of a Poisson likelihood with a gamma prior can be expressed as (where P=p+yi and Q=q+n.):

(y) =PQ
Var(y) =PQ+PQ2

Hint: Use the identities Equations (3.2) and (3.3). where (yλ)=λ, Var(yλ)=λ, (λ)=PQ and Var(λ)=PQ2.

The extra variance of the predictive over the estimative

Note that the uncertainty in a future observation is denoted by Var(y) and that

limp,q0Var(y)=yin+yin2
  • The uncertainty of a future observation can be split up into two parts

    1. (1)

      Uncertainty of the sampling distribution : yin

    2. (2)

      Parameter uncertainty: yin2

  • Predicting only from the MLE only gives (1) but fails to take into account (2).

  • Parameter uncertainly gets smaller for large samples: limnyin2=0.

3.1.3 A Normal likelihood with a normal posterior

The predictive for a Normal likelihood

Recall that the sum of independent normal random variables is also normal. Therefore, since both μ and ϵ~, conditional on y and σ2, are normally distributed, so is Y=μ+ϵ~. The predictive distribution is therefore

Y|𝐲,σ,σpNormal (μp,σp2+σ2)

It is worthwhile to have some intuition about the form of the variance of Y : In general, our uncertainty about a new sample Y is a function of our uncertainty about the center of the population σp2 as well as how variable the population is (σ2). As n we become more and more certain about where μ is, and the posterior variance σp2 of μ goes to zero. But certainty about μ does not reduce the sampling variability σ2, and so our uncertainty about Y never goes below σ2.

Prior, likelihood, posterior and predictive

Bayesian predicting with Normal observations (τ known)

Yiμ,τ Normal (μ,1τ),τ>0,i=1,2,,n
μ Normal (μ0,1τ0)
μy,τ Normal (μ0τ0+nτy¯τ0+nτ,1τ0+nτ)
Y*μ,τ Normal (μ,1τ)
Y*y,τ Normal (μ0τ0+nτy¯τ0+nτ,1τ0+nτ+1τ)

Note as τ0Y*y,τNormal (y¯,(1n+1)1τ) It can be shown that

Y*y(y¯,s2(1+1n),n-1)

where s2 is the sample variance.

The larger tail of the predictive compared to the likelihood

Figure 3.4: Link, Caption: The diagram compares the posterior distribution to the predictive. Note the increased tails of the predictive over the posterior particularly when the variance is unknown