2 Bayesian statistics 331-Week 2

2.4 Objective priors

Representations of ignorance

We will see in the next section that when seeking a prior that is non-informative we end up with the unfortunate property that it is improper.

An improper prior

An improper prior violates the axiom of probability theory that all probabilities must sum or integrate to 1.

θπ(θ)𝑑θ=.

2.4.1 Laplacian priors

Laplacian priors

A parameter μ on the real line

can be given the prior

π(μ)1.

For a normal distribution with variance assumed known leads to

π(θ𝐲)P(𝐲θ)

or the posterior is proportional to the likelihood.

For a positive parameter λ

it is possible to transfer to the real line using the log transformation γ=log(λ) and allocate that a uniform prior P(γ)1. This means

π(λ) =dγdλp(γ)
1λ

An example is σ2, the variance parameter of a normal distribution :

P(σ2)1σ2

.

Laplacian priors

For a proportion π

it is possible to transfer to the real line using the logit transformation: δ=log(π1-π) and allocate that a uniform prior P(δ)1. This means

P(π) =dδdπp(δ)
1π-11-π
1π(1-π)

This prior is improper and called Haldane’s prior

2.4.2 Jeffreys’ priors

Jeffreys’ priors 1

Consider that we might have specified a prior fΘ(θ) for a parameter θ in a model. It is quite reasonable to decide to use instead the parameter ϕ=1/θ. For example θ may be the parameter of the exponential distribution of inter-arrival times in a queue, and represents the arrival rate. Then ϕ represents the mean inter-arrival time. By probability theory the corresponding prior density for ϕ must be given by

fΦ(ϕ) = fΘ(θ)×|dθdϕ|
= fΘ(1/ϕ)1ϕ2.

If we decided that we wished to express our ignorance about θ by choosing fΘ(θ)1, then we are forced to take fΦ(ϕ)1/ϕ2.

The invariance property of Jeffreys’ priors

But if we are ignorant about θ, we are surely equally ignorant about ϕ, and so might equally have made the specification fΦ(ϕ)1. Thus, prior ignorance as represented by uniformity of belief, is not preserved under re-parametrization.

Jeffreys’ prior

Jeffreys’ prior is invariant under a parameter transformation ϕ(θ) and may be stated as:

JΦ(ϕ)=JΘ(θ)|dθdϕ|.

Jeffreys’ priors and Fisher’s information

There is one way of using the likelihood L(θ;x)=f(x|θ), or more accurately, the log likelihood (θ)=logL(θ;x), to specify a prior which is consistent across 1—1 parameter transformations. This is the ‘Jeffreys’ prior’, and is based on the concept of Fisher information:

I(θ)=-{d2(θ)dθ2}={(d(θ)dθ)2}.

Jeffreys’ prior

Jeffreys’ prior can be defined as

JΘ(θ)|I(θ)|12.

2.4.3 Examples

Example 1

Binomial sample.

Suppose x|θBinomial (n,θ). Find Jeffreys’ prior. Is it proper?

d2(θ)dθ2=-xθ2-(n-x)(1-θ)2,

and since E(x)=nθ,

I(θ) = nθθ2+(n-nθ)(1-θ)2
= nθ-1(1-θ)-1,

leading to f0(θ)θ-1/2(1-θ)-1/2 which in this case is the proper distribution

θBeta (12,12)

.

Example 2

Geometric distribution

Find the Jeffreys prior for θ in the geometric model: f(x|θ)=(1-θ)x-1θ;x=1,2,. (Note E(X)=1/θ.)

(θ) =(x-1)log(1-θ)+log(θ)
(θ) =-x-11-θ+1θ
′′(θ) =-x-1(1-θ)2-1θ2

Example 2 cont

I(θ) =x[′′(θ)]
=x[x-1(1-θ)2+1θ2]
=1θ-1(1-θ)2+1θ2=1-θθ(1-θ)2+1θ2
=1θ2(1-θ)
J(θ) =|I(θ)|12=θ-1(1-θ)-12

θBeta (0,12) Which is not proper.

Priors. a summary

  • Conjugate priors are often convenient because the posterior, marginal likelihood and predictive correspond (in the case of likelihoods from the exponential family) to known distributions.

  • However conjugate priors cannot represent total ignorance.

  • Improper priors can do this but have problems. For instance the marginal likelihood does not exist and Bayes factors cannot be calculated.

  • Jeffreys’ priors are invariant to monotonic transformation

  • Laplacian priors are calculated by transforming the parameter to the set of real numbers and giving the transformed parameter a flat prior.

  • Priors are both the greatest strength of Bayesian statistics and its greatest weakness.