3 Week 3: Bayesian statistics: Prediction 3 Week 3: Bayesian statistics: Prediction 3.2 The predictive for model checking

3.1 Examples

3.1.1 A binomial likelihood with a beta posterior

The predictive from a binomial likelihood with a beta prior

Suppose $y\sim\mbox{\rm Binomial\,$\left({n},{\pi}\right)$}$ and our (conjugate) prior for $\pi$ is $\pi\sim\mbox{\rm Beta\,$\left({p},{q}\right)$}$ . The posterior for $\pi$ is given by:

\pi|y\sim\mbox{\rm Beta\,$\left({p+y},{q+n-y}\right)$}.

We let $y^{\star}$ be the number of successes in those $N$ trials, so that

f(y^{\star}|\pi)=~{}{N\choose y^{\star}}\pi^{y^{\star}}(1-\pi)^{N-y^{\star}}.

$\displaystyle f(y^{\star}\|y)$	$\displaystyle=$	$\displaystyle\int_{0}^{1}~{}{N\choose y^{\star}}\pi^{y^{\star}}(1-\pi)^{N-y^{% \star}}\times\frac{\pi^{p+y-1}(1-\pi)^{q+n-y-1}}{\mbox{\rm B}(p+y,q+n-y)}d\pi$
	$\displaystyle=$	$\displaystyle{N\choose y^{\star}}\frac{1}{B(p+y,q+n-y)}\int_{0}^{1}\pi^{p+y+y^% {\star}-1}(1-\pi)^{N-y^{\star}+q+n-y-1}$
	$\displaystyle=$	$\displaystyle{N\choose y^{\star}}\frac{\mbox{\rm B}(y^{\star}+p+y,N-y^{\star}+% q+n-y)}{\mbox{\rm B}(p+y,q+n-y)}$

This is known as a Beta-binomial distribution. $y^{\star}\sim\mbox{\rm Beta-Binomial\,$\left({N},{P},{Q}\right)$}$ where $P=p+y$ and $Q=q+n-y$ .

The predictive from a binomial likelihood with a beta prior

Figure 3.2: Link, Caption: The diagram compares the binomial and the beta-binomial distribution. The beta-binomial can have more variation than the binomial

Prior, likelihood, posterior and predictive

Conjugate Bayesian predicting with a Binomial likelihood

	$\displaystyle Y\mid\theta$	$\displaystyle\sim\mbox{\rm Binomial\,$\left({n},{\theta}\right)$},\;\;\theta>0$
	$\displaystyle\theta$	$\displaystyle\sim\mbox{\rm Beta\,$\left({p},{q}\right)$}$
	$\displaystyle\theta\mid y$	$\displaystyle\sim\mbox{\rm Beta\,$\left({p+y},{q+n-y}\right)$}$
	$\displaystyle y^{*}\mid\theta$	$\displaystyle\sim\mbox{\rm Binomial\,$\left({N},{\theta}\right)$}$
	$\displaystyle y^{*}\mid y$	$\displaystyle\sim\mbox{\rm Beta-Binomial\,$\left({N},{p+y},{q+n-y}\right)$}$

3.1.2 A Poisson likelihood with a gamma posterior

A Poisson likelihood with a gamma prior

This is the probability of a future observation $y^{\star}$ given counts $y_{1},y_{2},\ldots,y_{n}$ have been observed. For the predictive, we integrate the likelihood over the posterior Gamma $\left({P},{Q}\right)$ , Note $P=p+\sum_{i=1}^{n}y_{i}$ and $Q=q+n$ .

$\displaystyle f(y^{*}\mid y)$	$\displaystyle=$	$\displaystyle\int_{\theta=0}^{\infty}f(y^{*}\|\theta)\pi(\theta\mid y)d\theta$
	$\displaystyle=$	$\displaystyle\int_{\theta=0}^{\infty}\frac{1}{y^{}!}e^{-\theta}\theta^{y^{}}% \frac{Q^{P}}{\Gamma(P)}\theta^{P-1}e^{-Q\theta}d\theta$
	$\displaystyle=$	$\displaystyle\frac{Q^{P}}{y^{}!\Gamma(P)}\int_{\theta=0}^{\infty}\theta^{y^{% }+P-1}e^{-\theta(Q+1)}d\theta$
	$\displaystyle=$	$\displaystyle\frac{Q^{P}}{y^{}!\Gamma(P)}\frac{\Gamma(y+P)}{(Q+1)^{y^{}+P}}$
	$\displaystyle=$	$\displaystyle\frac{Q^{P}}{y^{}!\Gamma(P)}\frac{\Gamma(y^{}+P)}{(Q+1)^{y^{*}+% P}}$
	$\displaystyle\propto$	$\displaystyle\frac{\Gamma(y^{}+P)}{y^{}!}{\left(\frac{Q}{Q+1}\right)^{P}% \left(\frac{1}{Q+1}\right)^{y^{*}}}$

\implies y^{*}\mid y\sim\mbox{\rm Negative-Binomial\,$\left({P},{\frac{Q}{Q+1}% }\right)$}

A Poisson likelihood with a gamma posterior

Figure 3.3: Link, Caption: The diagram compares the Poisson and the negative-binomial distribution. The negative-binomial can have more variation than the Poisson

Prior, likelihood, posterior and predictive

Bayesian prediction with a Poisson likelihood

	$\displaystyle Y_{i}\mid\theta$	$\displaystyle\sim\mbox{{\rm Poisson\,}$\left({\theta}\right)$},\;\;\theta>0,\;% \;y_{i}\geq 0,i=1,2,\ldots,n$
	$\displaystyle\theta$	$\displaystyle\sim\mbox{{\rm Gamma\,}$\left({p},{q}\right)$}$
	$\displaystyle\theta\mid y$	$\displaystyle\sim\mbox{{\rm Gamma\,}$\left({p+\sum_{i=1}^{n}y_{i}},{q+n}\right% )$}$
	$\displaystyle Y^{*}\mid\theta$	$\displaystyle\sim\mbox{{\rm Poisson\,}$\left({\theta}\right)$}$
	$\displaystyle\implies Y^{*}\mid y$	$\displaystyle\sim\mbox{\rm Negative-Binomial\,$\left({p+\sum_{i=1}^{n}y_{i}},{% \frac{q+n-\sum_{i=1}^{n}y_{i}}{1+q+n-\sum_{i=1}^{n}y_{i}}}\right)$}$

Identities useful for finding predictive variance

Identities for finding the marginal mean and variance of the predictive from the conditional means and variances

The following identities are useful particularly with the Normal distribution because the Normal distribution can be described completely by it mean and variance.

	$\displaystyle\mbox{\rm E\,}(y^{\star})$	$\displaystyle\equiv\mbox{\rm E\,}_{\lambda}\mbox{\rm E\,}_{y^{\star}\mid\lambda}$		(3.2)
	$\displaystyle{\rm Var}(y^{\star})$	$\displaystyle\equiv\mbox{\rm E\,}_{\lambda}{\rm Var}_{y^{\star}\mid\lambda}+{% \rm Var}_{\lambda}\mbox{\rm E\,}_{y^{\star}\mid\lambda}.$		(3.3)

The predictive variance of predictive with Poisson likelihood

Example

Show that the mean and variance of the predictive of a Poisson likelihood with a gamma prior can be expressed as (where $P=p+\sum y_{i}$ and $Q=q+n$ .):

	$\displaystyle\mbox{\rm E\,}(y^{\star})$	$\displaystyle=\frac{P}{Q}$
	$\displaystyle{\rm Var(y^{\star})}$	$\displaystyle=\frac{P}{Q}+\frac{P}{Q^{2}}$

Hint: Use the identities Equations (3.2) and (3.3). where $\mbox{\rm E\,}(y^{\star}\mid\lambda)=\lambda$ , ${\rm Var}(y^{\star}\mid\lambda)=\lambda$ , $\mbox{\rm E\,}(\lambda)=\frac{P}{Q}$ and ${\rm Var}(\lambda)=\frac{P}{Q^{2}}$ .

The extra variance of the predictive over the estimative

Note that the uncertainty in a future observation is denoted by ${\rm Var}(y^{\star})$ and that

\lim_{p,q\to 0}{\rm Var}(y^{\star})=\frac{\sum y_{i}}{n}+\frac{\sum y_{i}}{n^{% 2}}

•
The uncertainty of a future observation can be split up into two parts
1. (1)
  
  Uncertainty of the sampling distribution : $\frac{\sum y_{i}}{n}$
2. (2)
  
  Parameter uncertainty: $\frac{\sum y_{i}}{n^{2}}$
•

Predicting only from the MLE only gives (1) but fails to take into account (2).
•

Parameter uncertainly gets smaller for large samples: $\lim_{n\to\infty}\frac{\sum y_{i}}{n^{2}}=0$ .

3.1.3 A Normal likelihood with a normal posterior

The predictive for a Normal likelihood

Recall that the sum of independent normal random variables is also normal. Therefore, since both $\mu$ and $\tilde{\epsilon}$ , conditional on y and $\sigma^{2}$ , are normally distributed, so is $Y^{\star}=\mu+\tilde{\epsilon}$ . The predictive distribution is therefore

Y^{\star}|\textbf{y},\sigma,\sigma_{p}\sim\mbox{{\rm Normal\,}$\left({\mu_{p}}% ,{\sigma^{2}_{p}+\sigma^{2}}\right)$}

It is worthwhile to have some intuition about the form of the variance of $Y^{\star}$ : In general, our uncertainty about a new sample $Y^{\star}$ is a function of our uncertainty about the center of the population $\sigma_{p}^{2}$ as well as how variable the population is $(\sigma^{2})$ . As $n\rightarrow\infty$ we become more and more certain about where $\mu$ is, and the posterior variance $\sigma_{p}^{2}$ of $\mu$ goes to zero. But certainty about $\mu$ does not reduce the sampling variability $\sigma^{2}$ , and so our uncertainty about $Y^{\star}$ never goes below $\sigma^{2}$ .

Prior, likelihood, posterior and predictive

Bayesian predicting with Normal observations ( $\tau$ known)

	$\displaystyle Y_{i}\mid\mu,\tau$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\mu},{\frac{1}{\tau}}\right)$},\;% \;\tau>0,i=1,2,\ldots,n$
	$\displaystyle\mu$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\mu_{0}},{\frac{1}{\tau_{0}}}% \right)$}$
	$\displaystyle\mu\mid y,\tau$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\frac{\mu_{0}\tau_{0}+n\tau% \overline{y}}{\tau_{0}+n\tau}},{\frac{1}{\tau_{0}+n\tau}}\right)$}$
	$\displaystyle Y^{*}\mid\mu,\tau$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\mu},{\frac{1}{\tau}}\right)$}$
	$\displaystyle Y^{*}\mid y,\tau$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\frac{\mu_{0}\tau_{0}+n\tau% \overline{y}}{\tau_{0}+n\tau}},{\frac{1}{\tau_{0}+n\tau}+\frac{1}{\tau}}\right% )$}$

Note as $\tau\rightarrow 0\;Y^{*}\mid y,\tau\sim\mbox{{\rm Normal\,}$\left({\overline{y% }},{\left(\frac{1}{n}+1\right)\frac{1}{\tau}}\right)$}$ It can be shown that

Y^{*}\mid y\sim\mbox{\rm t\,$\left({\bar{y}},{s^{2}\left(1+\frac{1}{n}\right)}% ,{n-1}\right)$}

where $s^{2}$ is the sample variance.

The larger tail of the predictive compared to the likelihood

Figure 3.4: Link, Caption: The diagram compares the posterior distribution to the predictive. Note the increased tails of the predictive over the posterior particularly when the variance is unknown

3.1 Examples

3.1.1 A binomial likelihood with a beta posterior

The predictive from a binomial likelihood with a beta prior

The predictive from a binomial likelihood with a beta prior

Prior, likelihood, posterior and predictive

Conjugate Bayesian predicting with a Binomial likelihood

3.1.2 A Poisson likelihood with a gamma posterior

A Poisson likelihood with a gamma prior

A Poisson likelihood with a gamma posterior

Prior, likelihood, posterior and predictive

Bayesian prediction with a Poisson likelihood

Identities useful for finding predictive variance

Identities for finding the marginal mean and variance of the predictive from the conditional means and variances

The predictive variance of predictive with Poisson likelihood

Example

The extra variance of the predictive over the estimative

3.1.3 A Normal likelihood with a normal posterior

The predictive for a Normal likelihood

Prior, likelihood, posterior and predictive

Bayesian predicting with Normal observations (τ known)

The larger tail of the predictive compared to the likelihood

Bayesian predicting with Normal observations ( $\tau$ known)