4 Methods for Spatially Aggregated Data 4 Methods for Spatially Aggregated Data 4.2 Ecological bias

4.1 Poisson regression modelling

Recall that in a homogeneous Poisson process the number of events in any region $A$ follows a Poisson distribution with mean $\lambda|A|$ , where $|A|$ denotes the area of $A$ and $\lambda$ is the intensity, or mean number of events per unit area.

In an inhomogeneous Poisson process, constant intensity $\lambda$ is replaced by spatially-varying intensity, $\lambda(x)$ , and the number of events in $A$ is Poisson-distributed with mean

\mu(A)=\int_{A}\lambda(x)\mathrm{d}x.

Figure 4.1: Link, Caption: Figure showing a realisation of a homogeneous Poisson process (black dots) superimposed onto a tessellation of the unit square (solid lines). Recall that, by definition, the counts in each of the polygonal regions follow a Poisson distribution with mean equal to the area of the region multiplied by the intensity.

•

Let $A_{i}:i=1,...,n$ be a partition of a study region $A$ into sub-regions
•

Let $Y_{i}$ denote the number of cases in $A_{i}$ (i.e. instead of a point pattern, we only observe a count in each of the $n$ disjoint regions)
•

Suppose cases from a Poisson process with intensity $\lambda(x)$
•

Then, the $Y_{i}$ are mutually independent, $Y_{i}\sim\text{Poisson}(\mu_{i})$ , where

$\mu_{i}=\int_{A_{i}}\lambda(x)\mathrm{d}x$

4.1.1 Recap: Poisson regression modelling

The Poisson regression model is an example of a generalised linear model that takes as its starting point the model

Y_{i}\sim\text{Poisson}(\mu_{i})

and incorporates covariate information at a log-linear model

g(\mu_{i})=\log\mu_{i}=u_{i}^{\prime}\beta

where $g(\cdot)$ is often called the link function. The log function is the canonical link function for the Poisson model. Under this model, $\mu_{i}$ is equal to both the mean and the variance of $Y_{i}$ .

Given $\mu_{i}$ , the deviance is defined as

D=2\log\frac{L(\hat{\mu}_{i};y)}{L(\mu_{i};y)},

and can be used as a test of significance of a candidate model and its parameters. Two models can also be compared using

D(\mu^{(1)}_{i},\mu^{(2)}_{i})=2\log\frac{L(\mu^{(1)}_{i};y)}{L(\mu^{(2)}_{i};% y)}.

By Wilks’s Theorem, the deviance, asymptotically, has a $\chi^{2}_{q}$ -distribution, where $q$ is the number of degrees of freedom difference between the two models under consideration. Goodness of fit is discussed in Section 4.3.

In the context of spatially aggregated data, a relevant question to ask of the Poisson modelling approach is: does it correspond to any self-consistent model of the underlying spatial process of disease risk?