3 The exponential family 3.6 Summary of relationships 4 The EF likelihood and ML estimation

3.7 How $Y$ changes with covariates $x$

There are three notations for GLMS: generic, index, vector. We use all three, for instance, in the specification of the link function and the linear predictor.

Linear predictor: The explanatory variables influence the distribution of $Y$ through a single linear function, the linear predictor, which can be described as:

\displaystyle\begin{array}[]{rcll}\eta&=&\beta_{1}x_{1}+\beta_{2}x_{2}+\dots+% \beta_{p}x_{p}&\mathrm{(generic)}\quad\mathrm{or}\\ \eta_{i}&=&\beta_{1}x_{1i}+\beta_{2}x_{2i}+\dots+\beta_{p}x_{pi}&\mathrm{(% index)}\quad\mathrm{or}\\ \boldsymbol{\eta}&=&X\beta.&\mathrm{(vector)}\end{array}

Link function: The mean, $\mu_{i}$ of $Y_{i}$ , and linear predictor are related by a smooth invertible function $g(\cdot)$ called the link function:

\displaystyle g(\mu)=\eta,\quad\mathrm{or}\quad g(\mu_{i})=\eta_{i}\quad% \mathrm{or}\quad g(\boldsymbol{\mu})=\boldsymbol{\eta}.

Unnumbered Figure: Link

Linearity of the linear predictor

The linear predictor $\eta$ contain continuous variables such as $x$ but also known functions of these, e.g. $x^{2}$ , $\log(x)$ , combined as linear combinations, e.g. $\alpha x+2x^{2}-\beta\log(x)$ . The technical definition of linearity in GLMs refers to the linearity of $\eta$ in the parameters, rather than in the explanatory variables themselves.

The linear predictor may also contains discrete variables, and importantly, indicator variables. For example, to differentiate between three groups red, white and purple, let

•

$a_{i}=1$ if $i$ -th member is red, and 0 otherwise,
•

$b_{i}=1$ if $i$ -th member is white, and 0 otherwise,
•

$c_{i}=1$ if $i$ -th member is purple, and 0 otherwise.

Linear combinations of these indicator variables, $\alpha\mathbf{a}+\beta\mathbf{b}+\gamma\mathbf{c}$ , in the linear predictor indicate to which group the unit belongs, and then the effect on the response. The effect of being red adds $\alpha$ units to the mean response.

Causal interpretation

The key to any interpretation of the fitted model the way in which the mean of the response $\mu=\mathbb{E}[Y]$ changes with changes in the explanatory variables. The standard interpretation is to calculate the change in the expected response given a change in the explanatory variable $x_{j}$ , holding all other variables constant. Thus the partial derivative $\frac{\partial{\mu}}{\partial{x_{j}}}$ is an important part of this relationship which is mediated through the linear predictor and the link function.

Using the chain rule:

\frac{\partial{\mu}}{\partial{x_{j}}}=\frac{d\mu}{d\eta}\frac{\partial{\eta}}{% \partial{x_{j}}}=\left(\frac{dg(\mu)}{d\mu}\right)^{-1}\frac{\partial{\eta}}{% \partial{x_{j}}}=\frac{\beta_{j}}{g^{\prime}(\mu)},

where $g^{\prime}(\mu)=\frac{dg}{d\mu}$ and $\mu$ is a function of the coefficients $\beta_{1},\ldots,\beta_{p}$ . If $\beta_{j}=0$ then changes in $x_{j}$ lead to no change in the expected response, as long as other variables are held constant. When $\beta_{j}$ is not zero the actual change depends on the value of $\mu$ through the derivative of the link function.

Exercise 3.35
The function that relates the canonical parameter to the mean parameter for the Bernoulli distribution is $\theta=\mathrm{logit}(\mu)$ . Suppose the link function (that relates the linear predictor to the mean parameter) is assumed to be logit

\displaystyle\eta=\mathrm{logit}(\mu).

Given a single continuous covariate $x$ find $\frac{d\mu}{dx}$ and interpret.

Exercise 3.36
(Continuation). Suppose instead that $\eta=\alpha+\beta x+\gamma x^{2}$ . Find $\frac{d\mu}{dx}$ .

Link function in practice

The link function $g(\mu)=\eta$ relates the moment parameter $\mu=\mathbb{E}[Y]$ to the linear predictor $\eta$ , which is a linear combination of the covariates.

The default link functions are the $\mathrm{logit}$ for the Bernoulli distribution and the $\log$ for the Poisson distribution. The default for the Gaussian distribution is the identity function.

There are practical and theoretical reasons for choosing these defaults. The theoretical reason is that these are the canonical links for the Bernoulli and Poisson exponential families (EFs). The practical reason is that these functions transform the mean to make interpretation of the coefficients more obvious. For instance, with the AIDS example, increases are multiplicative (hence log), while, with birthweight it is additive (hence linear or identity).

3.7 How Y changes with covariates x

3.7 How $Y$ changes with covariates $x$