1 Lab Session 2 Comparing the clustering between cases and controls Bibliography

Intensity estimation

As seen in the notes, there are three main approaches for explaining spatial variation in risk: density ratio, binary regression and generalised additive model. Here we learn how to produce smooth intensity maps using R.

In the package spatstat, the main function for intensity estimation is density.ppp. When you apply this function to an object of class ppp, you can omit ”.ppp” from the call, i.e., instead of writing density.ppp(patt3,positive=TRUE), you would simply write:

den3 <- density(patt3, positive = TRUE)
den3

real-valued pixel image
128 x 128 pixel array (ny, nx)
enclosing rectangle: [0, 1] x [0, 2] units

Here, the option positive=TRUE forces the function to return positive values only for the density (otherwise, this function can return very small negative numbers where the intensity is close to zero). This retuns an object of class im, for which there exist plotting methods:

plot(den3)

Exercise 1.6.

What is the structure of the object den3?

For some (strange?) reason, the values of the intensity funtion are stored in this object in transpose (the element den3 $v$ ), so if we wanted to use the function image.plot from the package fields, then we’d have to transpose back first:

require(fields)
xvals <- den3$xcol  # the x-values
yvals <- den3$yrow  # the y-values
zvals <- t(den3$v)  # the estimated intensity, transposed

image.plot(xvals, yvals, zvals, asp = 1)

The function density.ppp fits a bivariate, isotropic, Gaussian kernel. The word “isotropic” in this context means that the variance matrix of the kernel functions is diagonal with equal elements i.e., a scalar multiple of the identity matrix, for example $\textstyle\left(\begin{array}[]{cc}2.2&0\\ 0&2.2\end{array}\right)$ .

The function density.ppp computes an estimate, $\hat{\lambda}$ of the true intensity $\lambda$ via,

\hat{\lambda}(s)=e(s)\sum_{i=1}^{n}K(s-s_{i}),

(1.1)

where $e(s)$ is a bias-correction for edge effects,

e(s)=1\big{/}\int_{W}K(s-x)\mathrm{d}x.

Equation 1.1 is an unbiased estimate of,

\lambda^{*}(s)=e(s)\int_{W}K(s-x)\lambda(x)\mathrm{d}x,

a smoothed version of the true intensity, see Baddeley (2010).

Notice that the call to density.ppp did not require the user to specify a bandwidth. This is because, according to the help file, spatstat, uses “ a simple rule of thumb that depends only on the size of the window”. There are two inbuilt funtions for computing an appropriate bandwidth using cross validation: bw.diggle and bw.ppl.

Exercise 1.7.

Look at the help files ?bw.diggle and ?bw.ppl to find out more about how these functions choose an appropriate bandwith. Now use the help file for density.ppp to compute a smoothed estimate of the intensity using the cross-validation methods bw.diggle and bw.ppl. Store the intensity estimates in objects den3a and den3b respectively. Which cross-validation method gives the better estimate of the intensity in this case?

To see what bandwidth was selected by each of the routines, we extract the sigma attribute of each of these objects:

attr(den3a, "sigma")

[1] 0.06090998

attr(den3b, "sigma")

[1] 0.2264535

Exercise 1.8.

Using the Leukaemia data, produce a plots of the intensity ratio of cases to controls using the default bandwidth selected, and the bandwidths selected by bw.diggle and bw.ppl. Describe what you observe, note that you might have to use the zlim argument to image.plot in order to obtain a sensible looking $z$ -axis scale. Give a critique of the results.

Exercise 1.9.

Now try producing plots of

\hat{p}(x)=\frac{\hat{\lambda}(x)}{\hat{\lambda}(x)+\hat{\lambda}_{0}(x)}

for each of the different bandwidths. Use zlim=c(0,1) to get these on the $[0,1]$ probability scale.