suppose we have some data , a realisation of some random variables that we assume have some (joint) distribution or model for the data . This is a fully generalised description: so far we are not assuming that the data are independent or identically distributed, and we may have a vector of parameters.
The fully general definition of the likelihood function is any function such that
viewed as a function of . Importantly, this does not define a distribution for , as is on the wrong side of the conditioning. It defines a distribution for the random variable for each fixed value of .
For much of this course, we will assume that consists of independent and identically distributed (IID) realisations, i.e. with each a realisation of the same random variable:
In this special case, we can write the likelihood function as being proportional to the product of the densities of the observations:
Note here is being used to denote a joint density and a marginal density.
Recall that the proportionality in the definition allows us to discard any multiplicative constants that do not involve . The set of possible values is , with called the parameter space. If then .
It is often more useful to work with the log–likelihood function, defined by:
with proportionality constants of
translated into an additive constant.
Note 1: Really both the likelihood and
log-likelihood are functions of both and ,
but usually we drop as the data do not change.
Note 2: Sometimes we are interested in how and change over different realisations of the random variables . Then we use and to show this dependence, with these being random functions of as they vary with .