1 Modelling and Statistical Inference The Two Statistician Tribes: The Nature of Probability The Likelihood Function

Philosophical Aside: Does God exist?

Suppose I am interested in $P[\textrm{God exists}]$ . As a frequentist, I am stumped: there is no repeatable experiment I can do and hence I simply cannot evaluate this probability.

Evidential probability, on the other hand, allows us to assign probability to any event, as a degree of belief. This is far more flexible but does cause problems: for example, evidential probability is subjective. Bayesians allow this more general definition of probability.

MATH331 Bayesian Inference considers evidential probability, and compares and contrasts the two approaches further. (For a pop. lit. view, see also Nate Silver’s The Signal and the Noise: The Art and Science of Prediction).

In this course we will be frequentists, and only allow the physical interpretation of probability.

Note that the frequency-based interpretation of probability can be justified through the weak law of large numbers (WLLN) introduced and proved in MATH230. Recall: the WLLN says that for a sequence of independent and identically distributed random variables, $X_{1},\ldots,X_{n}$ with mean $\mu$ and finite variance $\sigma^{2}$ then, for any $\epsilon>0$ , we have

P[|\bar{X}_{n}-\mu|>\epsilon]\rightarrow 0

as $n\rightarrow\infty$ .

For the coin tossing example we can define

X_{i}=\left\{\begin{array}[]{ll}1&\text{ when }\text{toss $i$ is a head}\\ 0&\text{ when }\text{toss $i$ is a tail}\end{array}\right.

then $X_{i}\sim\textrm{Bernoulli}(\theta_{0})$ , and has mean $\theta_{0}$ and finite variance $\theta_{0}(1-\theta_{0})$ , $i=1,\ldots,n$ . Therefore WLLN applies and we have that, for all $\epsilon>0$ ,

P[|r/n-\theta_{0}|>\epsilon]\rightarrow 0

as $n\rightarrow\infty$ .

Rules of the Frequentist Approach

What does this all mean in terms of statistical inference? We have collected data, $\vec{x}=x_{1},\ldots,x_{n}$ , which we assume derives from some probability model, with density (in the continuous case) or mass function (in the discrete case) $\vec{f}(\vec{x}|\theta)$ . In order to answer our subject-matter question, we typically want to estimate the parameter(s) $\theta$ .

As frequentists, we can make the statement

P[\vec{x}|\theta=2],

because we can imagine (hypothetically at least) taking more samples $\vec{x}$ (a repeatable experiment). Indeed, we therefore also consider $\vec{x}$ as a realisation of a random variable $\vec{X}$ , which describes the distribution of all possible samples we could have obtained.

As frequentists, we CANNOT make the statement

P[\theta=2|\vec{x}],

because we cannot take repeats of $\theta$ : it has a true value that is fixed and unknown — either $\theta=2$ or $\theta\neq 2$ , and there is no repeatable experiment we can do. However, Bayesians, with their more flexible interpretation of probability, can make this statement, see MATH331.

We therefore need a way to make inference about $\theta$ despite not being able to make probability statements about it.

We are generally interested in making inference (loosely, learning) about the parameter $\theta$ . We let $\theta_{0}$ denote the true value of the unknown parameter $\theta$ .

An estimator of $\theta$ is a function, $T(\vec{X})$ , of the random sample $\vec{X}$ ; the particular value for the observed sample $\vec{x}$ , $T(\vec{x})$ , is an estimate. In Math 235 we looked at various criteria for judging the quality of an estimator:

Unbiasedness:

E(T(\vec{X}))=\theta_{0}.

Consistency:

T(\vec{X})\rightarrow\theta_{0}

as $n\rightarrow\infty$ .

and different techniques for constructing estimators (method-of-moments, maximum likelihood, …). As before, we will focus mainly on likelihood techniques.