The parameter vector of the GLM is estimated using a solution to th log-likelihood function as follows. In the canonical form, for independent observations , the likelihood is given by:
Next we take the first derivative of with respect to , . Note here that is a one-to-one function of with and is a one-to-one function of through the link function and finally . Hence, by the chain rule:
where
This leads to the likelihood equations:
We denote the above likelihood vector form by:
where
(5.2) |
In the sequel, all sums are over from 1 to n, unless otherwise and the subscript is omitted from the summands. The -the component, of is:
(5.3) |
and let the expectation of the negative Hessian matrix be:
where both and are evaluated at the current estimate of . Then from (5.3), for ,
Hence:
(5.4) |
To apply Fisher’s scoring method, note that the -th component of is:
(5.5) |
where is the -th linear predictor evaluated estimate. Hence from (5.3) and (5.5):
(5.6) |
where
(5.7) |
with all quantities ( and ) evaluated at the current estimate . Consequently from (5.1), (5.5) and (5.6):
where is a diagonal matrix with -th diagonal entry of (5.2).
Remark 1 on implementation: For implementing IRWLS, start with initial and first compute the linear predictor . Then calculate ; however, often the initial ’s are taken as ’s and one evaluates (There are obvious problems with such choice, for example, when some ’s are zero and one has to take the logarithm as in the Poisson case). Finally, of (5.7) is evaluated and the iteration continues.
Remark 2: Consider a multiple linear regression model with observed ’s of (5.7) defined as:
(5.8) |
Since , the updated estimate is nothing but the weighted least squares estimate of with weights given by ’s, where these weights and ’s are calculated using the current value of since that is the best approximation at the current stage. The hypothetical model (5.8) can be motivated from a one-step Taylor approximation:
Remark 3: From (5.4), if is a constant function of . This happens under the canonical link function and consequently Fisher’s scoring method and Newton-Raphson method for finding coincide resulting in fast convergence. This is because:
and for this to be free from , is a constant or:
In particular:
Simple linear regression – ,
Poisson regression – ,
Logistic regression – ,