3 Case-Control Methods

3.2 Spatial clustering

  • most epidemiological case-maps show apparent clustering because cases occur most often in areas of high population

  • if this is the only source of spatial clustering, the same should be true of the control-map

  • more formally:

    • under the null hypothesis of no spatial clustering, the cases and the controls are independent random samples from the same underlying population at risk

    • under this hypothesis, K1(s)=K0(s)

  • Hence, consider D(s)=K1(s)-K0(s)

Diggle and Chetwynd (1991) propose a test of spatial clustering using the statistic

D=0s0{v(s)}-0.5D^(s)ds

where v(s) is the variance of D^(s)=K^1(s)-K^0(s) under random permutation of the case-control labels. Significance is assessed either by a Normal approximation or, for an exact Monte Carlo test, by simulation from the randomisation distribution under the null. Thus, whilst the statistic is motivated by the theory of stationary point processes, the inference is design-based.

There is an extensive literature on alternative tests for spatial clustering. The Diggle-Chetwynd test does not claim to be the “best” in any sense. For a discussion of other approaches, see Cuzick and Edwards (1990).

Example 3.1.

Childhood leukaemia in Humberside, from Cuzick and Edwards (1990).

Figure 3.1: First Link, Second Link, Caption: Childhood leukaemia in Humberside. Left: red dots are residential locations of all known cases of childhood leukaemia in Humberside, England, over the period 1974-82. Right: red dots are residential locations of a random sample from the birth register over the same area and time-period. The clustering of the red dots has a structure that is similar to the urban population in this area, with most cases being observed in the city of Hull and other clusters corresponding (mainly) to the locations of other villages and towns. Note that in general, it is visually very difficult to make comparisons regarding the locations of cases and controls using two separate plots as in this figure (unless there is a very obvious pattern), and even using one single plot with cases and controls overlaid is often difficult to interpret - formal statistical procedures are therefore very important for the purpose of quantifying differences.
Figure 3.2: Link, Caption: Shows the estimate D^(s) for the leukaemia data (solid line), with plus and minus two standard errors under random labelling (dashed lines). It can be seen that D^(s) mainly stays within the confidence band, apart from one or two small areas and although D^(s)>0 for much of the considered range of s, we would not consider this as evidence for a difference in clustering as the variation is within what we would expect under the null hypothesis (i.e. random labelling).
  • A Monte Carlo test using the test statistic D with 99 simulated random labellings gave a p-value of 0.14.

  • The Normal approximation gave a standard Normal deviate of Z=1.21, corresponding to a one-sided p-value of 0.11.

3.2.1 Adaptation to matched case-control data

Chetwynd et al. (2001) consider the adaptation of the above method to individually matched case-control data.

  • for a test of clustering, a Monte Carlo test based on D is still available, comparing the observed value of D with simulate values under random re-labellings within matched case-control sets

  • for estimation, modifications are necessary because:

    • the randomisation variance of D^(s) changes

    • more fundamentally, in a k-to-1 matched case-control study, E[D]0 under the null hypothesis of no spatial clustering.

Example 3.2.

Childhood diabetes in Yorkshire, England

Figure 3.3: Link, Caption: Locations of cases of childhood diabetes (solid dots) and matched controls (crosses) across Yorkshire, West Yorkshire and Humberside. The large cluster of points in the bottom left area of the plot are cases around the Leeds/Huddersfield area, the next largest cluster of points on the bottom right of the area are in the city of Hull. Other clusters appear around the City of York and the town of Harrogate, as well as the seaside towns of Bridlington, Scarborough and Whitby.
  • matched case-control study

  • two controls per case, matched by age, sex and FHSA (Family Health Services Authority)

Note that one of the matching variables is, by definition, spatially structured, hence the matched design masks some of the underlying spatial variation. Using a random sample of controls within each FHSA may have been preferable.