1 Modelling and Statistical Inference

Philosophical Aside: Does God exist?

Suppose I am interested in P[God exists]. As a frequentist, I am stumped: there is no repeatable experiment I can do and hence I simply cannot evaluate this probability.

Evidential probability, on the other hand, allows us to assign probability to any event, as a degree of belief. This is far more flexible but does cause problems: for example, evidential probability is subjective. Bayesians allow this more general definition of probability.

MATH331 Bayesian Inference considers evidential probability, and compares and contrasts the two approaches further. (For a pop. lit. view, see also Nate Silver’s The Signal and the Noise: The Art and Science of Prediction).

In this course we will be frequentists, and only allow the physical interpretation of probability.

Note that the frequency-based interpretation of probability can be justified through the weak law of large numbers (WLLN) introduced and proved in MATH230. Recall: the WLLN says that for a sequence of independent and identically distributed random variables, X1,,Xn with mean μ and finite variance σ2 then, for any ϵ>0, we have

P[|X¯n-μ|>ϵ]0

as n.

For the coin tossing example we can define

Xi={1 when toss i is a head0 when toss i is a tail

then XiBernoulli(θ0), and has mean θ0 and finite variance θ0(1-θ0), i=1,,n. Therefore WLLN applies and we have that, for all ϵ>0,

P[|r/n-θ0|>ϵ]0

as n.

Rules of the Frequentist Approach

What does this all mean in terms of statistical inference? We have collected data, x=x1,,xn, which we assume derives from some probability model, with density (in the continuous case) or mass function (in the discrete case) f(x|θ). In order to answer our subject-matter question, we typically want to estimate the parameter(s) θ.

As frequentists, we can make the statement

P[x|θ=2],

because we can imagine (hypothetically at least) taking more samples x (a repeatable experiment). Indeed, we therefore also consider x as a realisation of a random variable X, which describes the distribution of all possible samples we could have obtained.

As frequentists, we CANNOT make the statement

P[θ=2|x],

because we cannot take repeats of θ: it has a true value that is fixed and unknown — either θ=2 or θ2, and there is no repeatable experiment we can do. However, Bayesians, with their more flexible interpretation of probability, can make this statement, see MATH331.

We therefore need a way to make inference about θ despite not being able to make probability statements about it.

We are generally interested in making inference (loosely, learning) about the parameter θ. We let θ0 denote the true value of the unknown parameter θ.

An estimator of θ is a function, T(X), of the random sample X; the particular value for the observed sample x, T(x), is an estimate. In Math 235 we looked at various criteria for judging the quality of an estimator:

Unbiasedness:

E(T(X))=θ0.

Consistency:

T(X)θ0

as n.

and different techniques for constructing estimators (method-of-moments, maximum likelihood, …). As before, we will focus mainly on likelihood techniques.