Suppose I am interested in . As a frequentist, I am stumped: there is no repeatable experiment I can do and hence I simply cannot evaluate this probability.
Evidential probability, on the other hand, allows us to assign probability to any event, as a degree of belief.
This is far more flexible but does cause problems: for example, evidential probability is subjective. Bayesians allow this more general definition of probability.
MATH331 Bayesian Inference considers evidential probability, and compares and contrasts the two approaches further. (For a pop. lit. view, see also Nate Silver’s The Signal and the Noise: The Art and Science of Prediction).
In this course we will be frequentists, and only allow the physical interpretation of probability.
Note that the frequency-based interpretation of probability can be justified through the weak law of large numbers (WLLN) introduced and proved in MATH230. Recall: the WLLN says that for a sequence of independent and identically distributed random variables, with mean and finite variance then, for any , we have
as .
For the coin tossing example we can define
then , and has mean and finite variance , . Therefore WLLN applies and we have that, for all ,
as .
What does this all mean in terms of statistical inference? We have collected data, , which we assume derives from some probability model, with density (in the continuous case) or mass function (in the discrete case) . In order to answer our subject-matter question, we typically want to estimate the parameter(s) .
As frequentists, we can make the statement
because we can imagine (hypothetically at least) taking more samples (a repeatable experiment). Indeed, we therefore also consider as a realisation of a random variable , which describes the distribution of all possible samples we could have obtained.
As frequentists, we CANNOT make the statement
because we cannot take repeats of : it has a true value that is fixed and unknown — either or , and there is no repeatable experiment we can do. However, Bayesians, with their more flexible interpretation of probability, can make this statement, see MATH331.
We therefore need a way to make inference about despite not being able to make probability statements about it.
We are generally interested in making inference (loosely, learning) about the parameter . We let denote the true value of the unknown parameter
.
An estimator of is a function, , of the random sample ; the particular value for the observed sample , , is an estimate. In Math 235 we looked at various criteria for judging the quality of an estimator:
Unbiasedness:
Consistency:
as .
and different techniques for constructing estimators (method-of-moments, maximum likelihood, …). As before, we will focus mainly on likelihood techniques.