1 Lab Session 2 Testing for spatial clustering Intensity estimation

Comparing the clustering between cases and controls

In the above examples, we have seen how to compare a given dataset against CSR. We can also compare two datasets to each other and assess whether the two are significantly different. Case control studies are epidemiological studies, in which careful selection of the control group needs to be made in order to be able to address the scientific question of interest.

The test statistic used in this case is the difference between the $K$ functions estimated using the cases and the controls,

D(s)=K_{\text{cases}}(s)-K_{\text{controls}}(s).

Under the null hypothesis of no spatial clustering, both the cases and the controls come from the same population. We can thus pull all the data together, randomly relabel them as cases or controls, then recalculate $K(s)$ and $D(s)$ . Repeating this a number of times, we can find empirical upper and lower limits for $D(s)$ . T

As an example, we will use the point pattern patt4 (the cases) and we will compare this pattern to a second realisation of the same process, patt5 (the controls).

cases <- patt4
contr <- rpoispp(lambda = lambda4, lmax = 80, win = win4)

kcases <- Kest(cases)
r <- kcases$r
kcontr <- Kest(contr, r = r)

Note that r is the fixed vector of radii that we will use to compute the $K$ -functions below. We can now compute and plot $D$ .

D <- kcases$iso - kcontr$iso
plot(r, D, xlab = "r", ylab = "D(r)", type = "l")

To get a confidence band for $D$ , we again use 200 simulations and proceed in a similar manner to above:

Dsim <- c()  # we’ll store our result in this object
x <- c(cases$x, contr$x)
y <- c(cases$y, contr$y)
cc <- c(rep("case", cases$n), rep("control", contr$n))
for (i in 1:nsims) {
    cc <- sample(cc)
    simcases <- ppp(x = x[cc == "case"], y = y[cc ==
        "case"], window = win4)
    simcontr <- ppp(x = x[cc == "control"], y = y[cc ==
        "control"], window = win4)
    dsim <- Kest(simcases, r = r, correction = "isotropic")$iso -
        Kest(simcontr, r = r, correction = "isotropic")$iso
    Dsim <- cbind(Dsim, dsim)
}

qts <- apply(Dsim, 1, quantile, probs = c(0.025, 0.975))

Plotting the result:

plot(NULL, xlab = "r", ylab = "Estimated K function",
    xlim = range(r), ylim = range(Dsim))
polygon(c(r, rev(r)), c(qts[1, ], rev(qts[2, ])), col = "lightgrey",
    border = NA)
lines(r, D)
abline(h = 0, col = "red", lty = "dashed")

Our simulated cases and controls are from the same inhomogeneous Poisson process model. We do not expect either group to be more clustered than the other, hence we expect most of $D(s)$ to be within the envelope.

We now generate a very simple example of a clustered process.

x <- patt4$x[1:100]  # take the first 100 points of patt4
y <- patt4$y[1:100]
x <- c(x, rnorm(400, x, 0.05))  # tag on some additional points
y <- c(y, rnorm(400, y, 0.05))
clust <- ppp(x = x, y = y, window = win4)  # note some points will be excluded

plot(clust)

Exercise 1.4.

Draw the estimate of $D(s)$ and confidence envelope comparing clust as the cases with contr as the controls.

Exercise 1.5.

Repeat the exercise for the leukaemia data.