We will see in the next section that when seeking a prior that is non-informative we end up with the unfortunate property that it is improper.
An improper prior violates the axiom of probability theory that all probabilities must sum or integrate to 1.
can be given the prior
For a normal distribution with variance assumed known leads to
or the posterior is proportional to the likelihood.
it is possible to transfer to the real line using the log transformation and allocate that a uniform prior . This means
An example is , the variance parameter of a normal distribution :
.
it is possible to transfer to the real line using the logit transformation: and allocate that a uniform prior . This means
This prior is improper and called Haldane’s prior
Consider that we might have specified a prior for a parameter in a model. It is quite reasonable to decide to use instead the parameter . For example may be the parameter of the exponential distribution of inter-arrival times in a queue, and represents the arrival rate. Then represents the mean inter-arrival time. By probability theory the corresponding prior density for must be given by
If we decided that we wished to express our ignorance about by choosing , then we are forced to take .
But if we are ignorant about , we are surely equally ignorant about , and so might equally have made the specification . Thus, prior ignorance as represented by uniformity of belief, is not preserved under re-parametrization.
Jeffreys’ prior is invariant under a parameter transformation and may be stated as:
There is one way of using the likelihood , or more accurately, the log likelihood , to specify a prior which is consistent across 1—1 parameter transformations. This is the ‘Jeffreys’ prior’, and is based on the concept of Fisher information:
Jeffreys’ prior can be defined as
Suppose . Find Jeffreys’ prior. Is it proper?
and since ,
leading to which in this case is the proper distribution
.
Find the Jeffreys prior for in the geometric model: (Note .)
Which is not proper.
Conjugate priors are often convenient because the posterior, marginal likelihood and predictive correspond (in the case of likelihoods from the exponential family) to known distributions.
However conjugate priors cannot represent total ignorance.
Improper priors can do this but have problems. For instance the marginal likelihood does not exist and Bayes factors cannot be calculated.
Jeffreys’ priors are invariant to monotonic transformation
Laplacian priors are calculated by transforming the parameter to the set of real numbers and giving the transformed parameter a flat prior.
Priors are both the greatest strength of Bayesian statistics and its greatest weakness.