4 The EF likelihood and ML estimation 4.2 The likelihood for a GLM 5 IRWLS Algorithm

4.3 The deviance

We saw in the previous Chapter that, for $f$ in any EF with an unconstrained moment parameter $\mu$ and for a single observation, the MLE is

\displaystyle\hat{\mu}=y.

For a GLM with $n$ observations the largest number of parameters that we can have is $n$ , a parameter for each observation, corresponding to the saturated model. The largest likelihood value then occurs at

\displaystyle\hat{\mu}_{i}=y_{i}\quad i=1,2,\dots,n\quad\mbox{or }\hat{% \boldsymbol{\mu}}=\mathbf{y}.

and equals $\ell(\boldsymbol{\mu}=\mathbf{y})$ . The model exactly fits all the responses. This sets a baseline for comparison with more interesting sub-models.

For a GLM with $\beta\in\mathbb{R}^{p}$ , the parameter estimates $\hat{\beta}$ give corresponding fitted values $\hat{\boldsymbol{\mu}}$ . These fitted values are constrained to lie in a $p$ -dimensional subspace of $\mathbb{R}^{n}$ .

Definition 4.3.1.

The deviance, of the model is

\displaystyle dev(\mathcal{M})=2[\ell(\mathbf{y})-\ell(\hat{\boldsymbol{\mu}})% ],\quad g(\hat{\boldsymbol{\mu}})\in\mathcal{M}.

The closer a GLM fits the data, the more similar $\hat{\mu}_{i}$ is to $y_{i}$ and the closer $\ell(\hat{\boldsymbol{\mu}})$ is to $\ell(\mathbf{y})$ so the smaller the deviance. Since $\ell(\mathbf{y})$ is always greater than $\ell(\hat{\boldsymbol{\mu}})$ , the residual deviance is always non-negative.

Exercise 4.40
Find the deviance for a Poisson regression model.

Below is the output from the Poisson regression of the AIDS data:

Call:  glm(formula = number ~ 1 + time, family = poisson(link = "log"),
    data = aids)

Coefficients:
(Intercept)         time
     0.3396       0.2565

Degrees of Freedom: 13 Total (i.e. Null);  12 Residual
Null Deviance:      207.3
Residual Deviance: 29.65        AIC: 86.58

Here we see that the deviance for the proposed model $\mathcal{M}=\{\log\boldsymbol{\mu}=a\mathbf{1}+b\mathbf{t}\}$ has deviance 29.65. The deviance for the null model $\mathcal{M}=\{\log\boldsymbol{\mu}=a\mathbf{1}\}$ has deviance 207.3.

Exercise 4.41
Find the deviance for an Exponential regression model.

The null model

At the other extreme, the smallest number of parameters we can have is one, in which all responses have a common mean parameter $\mu_{i}=\mu$ . This simplest model is called the null or minimal model, determined by $\mathcal{M}=\{g(\boldsymbol{\mu})=\alpha\mathbf{1}\}=\mathrm{span}(\mathbf{1})$ . The deviance of this model $dev(\mathrm{span}(\mathbf{1}))$ is the total deviance to explain.

For the EF it usually turns out that

\displaystyle\hat{\mu}_{i}

\displaystyle=

\displaystyle\bar{y},\quad i=1,2,\dots,n.

The main exception is the binomial model where $\hat{\mu}$ is the overall proportion of successes, and one gets a weighted mean. For the normal multiple regression model with $\sigma^{2}=1$ , $dev(\mathrm{span}(\mathbf{1}))=\sum_{i=1}^{n}(y_{i}-\bar{y})^{2}$ . We often write $dev(\mathbf{1})$ .

4.3.1 Distribution of the deviance

The residual deviance of a fitted GLM with fitted means $\hat{\boldsymbol{\mu}}^{\mathcal{M}}$ maximised over $g({\boldsymbol{\mu}})=\boldsymbol{\eta}\in\mathcal{M}$ is

\displaystyle dev(\mathcal{M})=2\{\ell(\mathbf{y})-\ell(\hat{\boldsymbol{\mu}}% ^{\mathcal{M}})\}.

Under certain conditions which require that the sample size $n$ is large and the assumptions that generate the data from the model hold, the distribution of the deviance is

\displaystyle dev(\mathcal{M})\sim\chi^{2}_{df},

where the $df=n-p$ .

Normal linear models

The normal pdf is

f(y|\mu)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left\{-\frac{(y-\mu)^{2}}{2\sigma% ^{2}}\right\}.

The log-likelihood from $n$ independent observations on $\mathrm{N}(\mu_{i},\sigma^{2})$ is

\ell(\boldsymbol{\mu})=-\frac{1}{2\sigma^{2}}\sum_{i=1}^{n}(y_{i}-{\mu_{i}})^{% 2}-n\frac{1}{2}\log(2\pi\sigma^{2}).

Exercise 4.42
Find the scaled deviance for a normal linear regression model with known variance $\sigma^{2}$ based on a sample of size $n$ and fitted values $\hat{\mu_{i}}$ .

For the birthweight data, we know the sex of each baby, so we might want to assess the effect of this on birthweight as well as age. A possible model for expected birthweight, $\mu$ , is

\displaystyle\mu_{i}

\displaystyle=

\displaystyle a+bx_{i}+cs_{i}\quad\mathrm{for}~{}~{}i=1,2,\dots,n,

where $s_{i}$ is 0 if female, and 1 if male, so that $c$ represents how much heavier male babies tend to be compared to female ones, after correcting for age.

The following code is used to fit this model in R.

birthweight = read.table("birthweight.dat")
MODEL <- glm( weight ~ 1 + age + sex, family=gaussian, data=birthweight)
MODEL

The output of the model gives:

Call:  glm(formula = weight ~ 1 + age + sex, family = gaussian,
   data = birthweight)

Coefficients:
(Intercept)          age         sexM
    -1773.3        120.9        163.0

Degrees of Freedom: 23 Total (i.e. Null);  21 Residual
Null Deviance:      1830000
Residual Deviance: 658800       AIC: 321.4

Note that in R, for the Gaussian family, the residual deviance output from the fitted glm reports the residual sum of squares rather than the scaled deviance calculated above. With an estimated residual variance of $\hat{\sigma}^{2}=31370$ , it follows that the residual deviance of this model is $dev(\mathcal{M})=658800/31370=21.00$ . Based on 21 degrees of freedom, then the critical value from $\chi^{2}_{21}(0.95)=32.67$ and so we can conclude that the model is adequately describing the variability seen in the data at the 5% significance level.

A plot of the fitted model is presented in the figure below. Why are there two fitted lines in the plot?

Figure 4.1: Link, Caption: Birthweight data with fitted lines of best fit.

For a given mother’s age, say 38, the expected birthweight for a female child is -1773.3 + 120.9*38 = 2820.9 g, but expected birthweight for a male child is -1773.3 + 120.9*38 + 163.0 = 2983.9 g. Therefore, the intercept of the line of best fit corresponding to male babies increase by 163.0 g above the fitted line for females.

What happens if a sex effect is omitted?

 MODEL2 <- glm(weight ~ 1 + age, family = gaussian, data = birthweight)
 MODEL2

This returns:

Call:  glm(formula = weight ~ 1 + age, family = gaussian,
    data = birthweight)

Coefficients:
(Intercept)          age
    -1485.0        115.5

Degrees of Freedom: 23 Total (i.e. Null);  22 Residual
Null Deviance:      1830000
Residual Deviance: 816100       AIC: 324.5

With an estimated residual variance of $\hat{\sigma}^{2}=37095$ , it follow that the residual deviance of this model is $dev(\mathcal{M})=816100/37095=22.00$ . Based on 22 degrees of freedom, then the critical value from $\chi^{2}_{22}(0.95)=33.92$ and so we can conclude that the model is adequately describing the variability seen in the data at the 5% significance level.

This obtains the same fit as the plot presented in Chapter 1. Note the MLE of the regression coefficient on age differs from above.