The marginal likelihood, , (the denominator in Bayes theorem) is used by Bayesian for the purpose of model comparison. It is also known as the evidence in favour of the model. For model selection classical statisticians use measures of fit penalized by a measure of complexity. The marginal likelihood automatically does this. It uses the principle of Occam’s Razor to penalise large models.
Complex models with more parameters fit better and have a higher likelihood.
However complex models can predict poorly. Occam’s razor applies to prediction.
The marginal likelihood penalises the fit by a measure of complexity
Using Bayes theorem the posterior is
Rewriting this with the marginal likelihood as the subject we get
Writing this in log form we have
This is true for any including the MLE. increases with model complexity as does the penalty.
This is the Beta-binomial distribution.
which has more variation than the binomial.
Bayesian inference is obtained by updating our belief in a parameter by multiplying our current belief by the likelihood of the current observation(s) and then normalising. This yields the posterior distribution which can become the prior for the next observation.
Updating belief in a model is no different, in principle, to updating a parameter.
The marginal likelihood is the denominator in Bayes theorem.
The ratio of marginal likelihoods give us a Bayes factor which compares the weight of evidence in favour of two given models.
Bayes model selection is coherent and uses only the laws of probability.
A coin is tossed 10 times resulting in 9 heads and one tail. The question of interest is whether the coin is biased or not. The null hypothesis, , is that the coin is fair, , with the alternative hypothesis, , is that the coin is biased ().
Define the -value used in classical hypothesis testing. What is the implication of a low -value?
Use a classical hypothesis test to test whether the coin is fair with a 5% significance level.
If I assume (before the experiment) that each hypothesis is equally likely, calculate the Bayes Factor for relative to (If necessary state any further assumptions that you might need to make).
What is the probability that the coin is biased?
A -value is the probability of getting a test statistic at least as extreme as that observed under the null hypothesis by pure chance. A low p-value is evidence against the null hypothesis.
. The test is two tailed. Therefore the
value is
Since Reject in favour of with 95% certainty.
If we assume that under , where . The normalize constant for the binomial is
Therefore if then
[4]