1 Modelling and Statistical Inference

The Likelihood Function

suppose we have some data x, a realisation of some random variables X that we assume have some (joint) distribution or model f(|θ) for the data x. This is a fully generalised description: so far we are not assuming that the data are independent or identically distributed, and we may have a vector of parameters.

The fully general definition of the likelihood function is any function L(θ) such that

L(θ)f(x|θ),

viewed as a function of θ. Importantly, this does not define a distribution for θ, as θ is on the wrong side of the conditioning. It defines a distribution for the random variable X for each fixed value of θ.

For much of this course, we will assume that x consists of n independent and identically distributed (IID) realisations, i.e. x=(x1,,xn) with each xi a realisation of the same random variable:

Xif(|θ),i=1,,n.

In this special case, we can write the likelihood function as being proportional to the product of the densities of the observations:

L(θ)f(x|θ)=i=1nf(xi|θ).

Note here f is being used to denote a joint density and f a marginal density.

Recall that the proportionality in the definition allows us to discard any multiplicative constants that do not involve θ. The set of possible θ values is Θ, with Θ called the parameter space. If θΘ then L(θ)=0.

It is often more useful to work with the log–likelihood function, defined by:

(θ)=logL(θ)=i=1nlogf(xi|θ),

with proportionality constants of L(θ) translated into an additive constant.

Note 1: Really both the likelihood and log-likelihood are functions of both θ and x, but usually we drop x as the data do not change.

Note 2: Sometimes we are interested in how L(θ) and (θ) change over different realisations of the random variables X=(X1,,Xn). Then we use L(θ;X) and (θ;X) to show this dependence, with these being random functions of θ as they vary with X.