Consider the AIDS data presented in Chapter 1. To write down the likelihood function we need to specify the pmf under the assumption of independent observations. We wish to fit the model:
where is the number of observed deaths in the quarter . We denote the pmf by of a single observation . Here is a parametric function rather than a parameter itself. Furthermore, as the value of can change between observations due to the linear relation to the time covariate, so we do not assume that the observations are identically distributed.
aids = read.table("aids.dat") attach(aids) plot(time, number) a = 0.340 ; b = 0.257 ; mu = exp(a + b * time) lines(time, mu)
This plot is presented in Section 1.3.2. The free parameters that determine the mean here are and .
Free parameters
There are several ways to specify which parameters are free and which are constrained. The moment parameter is always in one-to-one correspondence with the canonical parameter, e.g. . We may write the likelihood in terms of or in terms of as .
Baseline models
Taking the values of , , , to be free leads to the saturated model with free parameters. It is of theoretical interest as a baseline model. However, as it does not simplify the data, it is not usually a model to fit in its own right; unknowns for observations!
Setting the values , , to be equal gives a model with just free parameter, the null model.
Parsimony suggests a model with greater than 1 but less than free parameters. Here, constraining the moment parameters by , where leads to a model with two free parameters.
Exercise 4.37
The contribution to the likelihood of the th observation, , drawn from the Poisson distribution is one of the following:
Which?
Model formula
With the log-likelihood of given observations is:
This is a function on .
Notation
We can consider this log-likelihood as a function of the vector of means , and write
where the means are constrained by
spanned by the -dimensional vectors and . Recall that is the link function for the Poisson distribution.
This notation allows the log-likelihoods for each model to be simply expressed as
constrained by .
Here the null model is . The saturated model is . And a 3-dimensional model is .
ML estimates
The partial derivative of the log-likelihood for each free parameter and are:
Except in special cases, it is not possible to analytically solve the above equations to determine the maximum likelihood estimates and . We therefore need to use a numerical technique to find the maximum likelihood estimate of the parameters. We shall examine this numerical technique in the next chapter. In the meantime, we can use this algorithm in R via the glm() command.
The necessary arguments for the glm() command are:
A symbolic description of the model to be fitted. E.g. number ~ 1 + time
.
A description of the error distribution and link function to be used in the model. E.g. poisson(link="log")
A data frame containing the variables of the model. E.g. aids.
We can estimate the free parameters by:
AIDSModel <- glm(number~1+time, family=poisson(link="log"), data=aids) AIDSModel
This will return
Call: glm(formula = number ~ 1 + time, family = poisson(link = "log"), data = aids) Coefficients: (Intercept) time 0.3396 0.2565 Degrees of Freedom: 13 Total (i.e. Null); 12 Residual Null Deviance: 207.3 Residual Deviance: 29.65 AIC: 86.58
The ML estimates and maximise the Poisson regression model .