2 Bayesian statistics 331-Week 2 2.3 The evidence for the model 3 Week 3: Bayesian statistics: Prediction

2.4 Objective priors

Representations of ignorance

We will see in the next section that when seeking a prior that is non-informative we end up with the unfortunate property that it is improper.

An improper prior

An improper prior violates the axiom of probability theory that all probabilities must sum or integrate to 1.

\int_{\theta}\pi(\theta)d\theta=\infty.

2.4.1 Laplacian priors

Laplacian priors

A parameter $\mu\in\Re$ on the real line

can be given the prior

\pi(\mu)\propto 1.

For a normal distribution with variance assumed known leads to

\pi(\theta\mid\textbf{y})\propto P(\textbf{y}\mid\theta)

or the posterior is proportional to the likelihood.

For a positive parameter $\lambda$

it is possible to transfer to the real line using the log transformation $\gamma=\log(\lambda)\in\Re$ and allocate that a uniform prior $P(\gamma)\propto 1$ . This means

	$\displaystyle\pi(\lambda)$	$\displaystyle=\frac{d\gamma}{d\lambda}p(\gamma)$
		$\displaystyle\propto\frac{1}{\lambda}$

An example is $\sigma^{2}$ , the variance parameter of a normal distribution :

P(\sigma^{2})\propto\frac{1}{\sigma^{2}}

Laplacian priors

For a proportion $\pi$

it is possible to transfer to the real line using the logit transformation: $\delta=\log\left(\frac{\pi}{1-\pi}\right)\in\Re$ and allocate that a uniform prior $P(\delta)\propto 1$ . This means

	$\displaystyle P(\pi)$	$\displaystyle=\frac{d\delta}{d\pi}p(\delta)$
		$\displaystyle\propto\frac{1}{\pi}-\frac{1}{1-\pi}$
		$\displaystyle\propto\frac{1}{\pi(1-\pi)}$

This prior is improper and called Haldane’s prior

2.4.2 Jeffreys’ priors

Jeffreys’ priors 1

Consider that we might have specified a prior $f_{\Theta}(\theta)$ for a parameter $\theta$ in a model. It is quite reasonable to decide to use instead the parameter $\phi=1/\theta$ . For example $\theta$ may be the parameter of the exponential distribution of inter-arrival times in a queue, and represents the arrival rate. Then $\phi$ represents the mean inter-arrival time. By probability theory the corresponding prior density for $\phi$ must be given by

	$\displaystyle f_{\Phi}(\phi)$	$\displaystyle=$	$\displaystyle f_{\Theta}(\theta)\times\left\|\frac{d\theta}{d\phi}\right\|$
		$\displaystyle=$	$\displaystyle f_{\Theta}(1/\phi)\frac{1}{\phi^{2}}.$

If we decided that we wished to express our ignorance about $\theta$ by choosing $f_{\Theta}(\theta)\propto 1$ , then we are forced to take $f_{\Phi}(\phi)\propto 1/\phi^{2}$ .

The invariance property of Jeffreys’ priors

But if we are ignorant about $\theta$ , we are surely equally ignorant about $\phi$ , and so might equally have made the specification $f_{\Phi}(\phi)\propto 1$ . Thus, prior ignorance as represented by uniformity of belief, is not preserved under re-parametrization.

Jeffreys’ prior

Jeffreys’ prior is invariant under a parameter transformation $\phi(\theta)$ and may be stated as:

J_{\Phi}(\phi)=J_{\Theta}(\theta)\left|\frac{d\theta}{d\phi}\right|.

Jeffreys’ priors and Fisher’s information

There is one way of using the likelihood $L(\theta;x)=f(x|\theta)$ , or more accurately, the log likelihood $\ell(\theta)=\log L(\theta;x)$ , to specify a prior which is consistent across 1—1 parameter transformations. This is the ‘Jeffreys’ prior’, and is based on the concept of Fisher information:

I(\theta)=-\mbox{\rm E\,}\left\{\frac{d^{2}\,\ell(\theta)}{d\theta^{2}}\right% \}=\mbox{\rm E\,}\left\{\left(\frac{d\,\ell(\theta)}{d\theta}\right)^{2}\right\}.

Jeffreys’ prior

Jeffreys’ prior can be defined as

J_{\Theta}(\theta)\propto|I(\theta)|^{\frac{1}{2}}.

2.4.3 Examples

Example 1

Binomial sample.

Suppose $x|\theta\sim\mbox{\rm Binomial\,$\left({n},{\theta}\right)$}$ . Find Jeffreys’ prior. Is it proper?

\frac{d^{2}\ell(\theta)}{d\theta^{2}}=\frac{-x}{\theta^{2}}-\frac{(n-x)}{(1-% \theta)^{2}},

and since $E(x)=n\theta$ ,

	$\displaystyle I(\theta)$	$\displaystyle=$	$\displaystyle\frac{n\theta}{\theta^{2}}+\frac{(n-n\theta)}{(1-\theta)^{2}}$
		$\displaystyle=$	$\displaystyle n\theta^{-1}(1-\theta)^{-1},$

leading to $f_{0}(\theta)\propto\theta^{-1/2}(1-\theta)^{-1/2}$ which in this case is the proper distribution

\theta\sim\mbox{\rm Beta\,$\left({\frac{1}{2}},{\frac{1}{2}}\right)$}

Example 2

Geometric distribution

Find the Jeffreys prior for $\theta$ in the geometric model: $f(x|\theta)=(1-\theta)^{x-1}\theta;~{}~{}~{}~{}~{}x=1,2,\ldots.$ (Note $\mbox{E}(X)=1/\theta$ .)

	$\displaystyle\ell(\theta)$	$\displaystyle=(x-1)\log(1-\theta)+\log(\theta)$
	$\displaystyle\ell^{{}^{\prime}}(\theta)$	$\displaystyle=-\frac{x-1}{1-\theta}+\frac{1}{\theta}$
	$\displaystyle\ell^{{}^{\prime\prime}}(\theta)$	$\displaystyle=-\frac{x-1}{(1-\theta)^{2}}-\frac{1}{\theta^{2}}$

Example 2 cont

	$\displaystyle I(\theta)$	$\displaystyle=\mbox{\rm E\,}_{x}\left[\ell^{{}^{\prime\prime}}(\theta)\right]$
		$\displaystyle=\mbox{\rm E\,}_{x}\left[\frac{x-1}{(1-\theta)^{2}}+\frac{1}{% \theta^{2}}\right]$
		$\displaystyle=\frac{\frac{1}{\theta}-1}{(1-\theta)^{2}}+\frac{1}{\theta^{2}}=% \frac{1-\theta}{\theta(1-\theta)^{2}}+\frac{1}{\theta^{2}}$
		$\displaystyle=\frac{1}{\theta^{2}(1-\theta)}$
	$\displaystyle\implies J(\theta)$	$\displaystyle=\left\|I(\theta)\right\|^{\frac{1}{2}}=\theta^{-1}(1-\theta)^{-% \frac{1}{2}}$

$\implies\theta\sim\mbox{\rm Beta\,$\left({0},{\frac{1}{2}}\right)$}$ Which is not proper.

Priors. a summary

•

Conjugate priors are often convenient because the posterior, marginal likelihood and predictive correspond (in the case of likelihoods from the exponential family) to known distributions.
•

However conjugate priors cannot represent total ignorance.
•

Improper priors can do this but have problems. For instance the marginal likelihood does not exist and Bayes factors cannot be calculated.
•

Jeffreys’ priors are invariant to monotonic transformation
•

Laplacian priors are calculated by transforming the parameter to the set of real numbers and giving the transformed parameter a flat prior.
•

Priors are both the greatest strength of Bayesian statistics and its greatest weakness.