1 Week 1- Bayesian inference.1 Week 1- Bayesian inference.1.2 Conjugacy

1.1 Introduction

The Bayesian universe

Figure 1.1: Link, Caption: The Bayesian universe. The observer acts on belief (probability) and utility.

1.1.1 An example

Illustration of Bayes Theorem

Example 1.1.1.

•

1% of women have breast cancer
•

80% of mammograms detect breast cancer when it is there
•

10% of mammograms detect breast cancer when itâs not there

Given that a patient tests positive, what is the probability she has breast cancer?

Figure 1.2: Link, Caption: The diagram shows a tree diagram showing all possible outcomes. The prior is on the left and the likelihood on the right. Outcome F denotes a false negative and outcome J denotes a false positive.

Figure 1.3: Link, Caption: Now assume that the test is positive. Applying Bayes theorem we condition on the data and exclude negative test outcomes F and I

Bayes theorem

Figure 1.4: Link, Caption: The diagram shows how Bayes theorem can be proved by expressing the joint distribution in two ways.

f(y,\theta)=\pi(\theta)f(y\mid\theta)=m(y)\pi(\theta\mid y)

1.1.2 The essence of Bayesian thinking

Bayes

The essence of the Bayesian approach is to treat the unknown parameter $\theta$ as a random variable, specify a prior distribution for $\theta$ representing your beliefs about $\theta$ prior to having seen the data, use Bayes’ Theorem to update prior beliefs into posterior probabilities, and draw appropriate inferences.

The Bayesian view of uncertainty

(a)

Statistics is the study of reasoning in the presence of uncertainty.
(b)

Uncertainty should be measured only by conditional probability.
(c)

A probability distribution shows information. The amount of information is proportional to the precision or the reciprocal of the variance.
(d)

Data uncertainty is so measured, conditional on the parameters (through the likelihood function).
(e)

It is only by using probability that we can achieve coherence (logically connected or consistent).
(f)

Rational belief (or knowledge) is updated upon fresh observations using Bayes’ theorem.

Informative and non-informative distributions

Figure 1.5: Link, Caption: Bayesians see probability densities as carriers of information. High variance densities carry little information. Low variance densities carry more information.

The Bayesian universe

Figure 1.6: Link, Caption: The Bayesian scheme for updating and acting on knowledge obtained from fresh observations. Circles depict unknowns where rectangles depict known quantities. The uncertainty governing the unknowns is expressed through a probability density which expresses the degree of rational or scientific belief given the knowledge and data in hand.

Disagreements between Bayesian and classical thinking

Bayesian and classical statisticians deal with the following statistical concepts and problems in fundamentally different ways.

•

What is probability?
•

What is fixed and what is random?
•

The nature of uncertainty
•

How uncertainty is expressed
•

What an interval means?
•

How to deal with nuisance parameters?
•

How prior scientific knowledge is made use of?
•

How expected utility or loss is calculated

Bayesian inference: Summary

•

For each numerical value $\theta\in\boldsymbol{\Theta}$ , our prior distribution $\pi(\theta)$ describes our belief in $\theta$ .
•

For each $\theta\in\boldsymbol{\theta}$ our sampling model $f(y\mid\theta)$ describes our belief that $y$ would be the outcome of our study if we knew $\theta$ to be true.
•

Once we obtain the data $y$ , the last step is to update our beliefs about $\theta$ . For each numerical value of $\theta\in\Theta$ , our posterior distribution $\pi(\theta\mid y)$ describes our updated belief that $\theta$ is the true value, having observed data set $y$ . This is obtained via Bayes rule:

$\pi(\theta\mid y)=\frac{\pi(\theta)f(y\mid\theta)}{m(y)}=\frac{f(y,\theta)}{m(% y)}=\frac{\rm{Joint}}{\rm{Marginal}}$

$m(y)$ is called the marginal likelihood and depicts the evidence in favour of the model being considered and is given by, when $\theta$ is continuous, by $m(y)=\int\pi(\theta)f(y\mid\theta)d\theta$ .
•

Because the marginal likelihood $m(y)$ does not involve $\theta$ , Bayes theorem is often written as

$\pi(\theta\mid y)\propto\pi(\theta)f(y\mid\theta)$

1.1.3 Some historical perspectives

The history and evolution of statistical reasoning.

In the middle of the last century several brilliant statisticians struggled to formalise or create a clear set of fundamental rules for inductive (or statistical) thinking. There were two main groups with contrasting philosophies : the classical school and the Bayesian school. However even within each of these main groups there were disagreements and these splits still exist today. The champion of the classical school was Ronald Fisher who developed and promoted his methods prior to the second world war. These ideas quickly spread and became widely used by statistical communities. Dennis Lyndley was the champion of the Bayesian school. He was driven by a desire to make statistics a formal axiomatic and coherent system. To do this he used only the axioms of probability (formulated by Kolmorogov) and the concept utility from the work of Savage.

Fisher and the development of likelihood theory

Figure 1.7: Link, Caption: Ronald Fisher (1890-1962) developed likelihood statistics

Ronald Fisher had a enormous impact on statistical thinking and practice with his theory of likelihood. This theory revolutionized statistical thinking at the time and methods that used it came into widespread practice. This theory, like many other theories before, had its weaknesses and did not please everybody. Fisher’s theory of hypothesis testing came under sustained attack even within the classical statistical community. Many of Fisher’s detractors argue that some of his ideas are thought to be responsible for widespread confusion and misunderstanding among scientists. The term ”statistical significance” measured by a p-value causes widespread confusion today.

Fisher and the development of likelihood theory

Some essential elements of the Fisher’s theory are the following:

•

Saw probability as a long run proportion and not as degree of rational belief.
•

Parameters were treated as fixed but unknown. Because the parameters are fixed it made no sense to give them probability distributions.
•

The likelihood of a parameter was the fundamental measure of uncertainty.
•

Uncertainty of a parameter is related to the variability of a sample statistic (and described by Fisher’s information).
•

Uncertainty of a hypothesis is measured by a $p$ -value which gives a measure of strength of evidence against the null hypothesis. The p-value is defined as the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. The rejection of this idea lead to a big split among the classical statisticians and the reformulation of hypothesis testing by Neyman and Pearson as a decision problem with type-I and type II error.

Lindley and the subjective Bayesian approach

Figure 1.8: Link, Caption: Dennis Lindley, 1923-2014.

•

Motivated by the need for an axiomatic system for statistics.
•

A wider definition of uncertainty than merely sampling uncertainty. All kinds of uncertainty including parameter, model uncertainty, measurent uncertainty. All of these are only be measured by probability.
•

Probability means degree of rational belief and is updated as new observation become available.
•

Probability is subjective and is conditional on the knowledge, $\mathcal{K}$ , or experience of the individual.

https://www.youtube.com/watch?v=YsJ4W1k0hUg&index=3&list=PLFDbGp5YzjqXQ4oE4w9GVWdiokWB9gEpm

Jeffreys and the objective Bayesian approach

Figure 1.9: Link, Caption: Harold Jeffreys, (1891-1989)

•

Defined an objective prior to represent ignorance or lack of any prior information.
•

Objective priors such as the Jeffreys’ prior have excellent frequentest properties such as good coverage.
•

Many statisticians from the objective school attempted to unify Bayesian and classical statistics using objective priors. These attempts have only been partially successful.

Inference

Bayesian analysis gives a more complete inference in the sense that all knowledge about $\theta$ , available from the prior and the data is represented in the posterior distribution. That is, $\pi(\theta\mid y)$ is the inference. Still, it is often desirable to summarize that inference in the form of a point estimate, or an interval estimate.