5 Discrete Spatial Variation

5.1 Discrete spatial variation

In models of discrete spatial variation, the geographical space under study is regarded as a fixed set of spatial sampling units, typically defined by a partitioning of a continuous region into politically defined sub-regions.

In the following example, the region is the north of England and the sub-regions are counties.

Figure 5.1: Link, Caption: Showing the county boundaries in the north of England.
  • Models for discrete spatial variation are usually defined in terms of their so-called full conditional distributions, incorporating notions of “local” dependence between spatial units.

  • More formally, if Yi:i=1,,n denote a set of outcome variables associated with each of n spatial units, the model is specified by the n univariate distributions

    [Yi|Y1,,Yi-1,Yi+1,,Yn],

    where (Y1,,Yn) a multivariate distribution

  • Note that a mutually consistent specification of the full conditionals involves non-obvious constraints on the allowable forms of distribution, which are set out in the celebrated Hammersley-Clifford Theorem (Besag (1974)).

  • Typically, simplifying assumptions are made so that only a few of the n-1 terms in the conditioning set play any part. A neighbourhood structure needs to be defined. For example, we may assume that there is correlation between a county and its adjacent counties only, ignoring the rest.

  • The following shows one example of how this might be done.

    Figure 5.2: Link, Caption: Showing counties in the north of England and illustrating one potential definition of a neighbourhood for the county of Yorkshire (the largest pictured county above the centre of the plot): namely all counties that share a boundry with Yorkshire. The black dots are located at the centroid of each county (anticlockwise from the top surrounding Yorkshire are: Durham, Cumbria, Lancashire, West Yorkshire, East Riding of Yorkshire and Cleveland) the solid lines joining Yorkshire to these counties have been added to make clear which of the polygons shares a border. Note that in practice, shapefiles like this can be unreliable: it is not necessarily the case that boundaries exactly touch, though they may appear to do so by eye - it is only by zooming substantially that these more subtle features are revealed.
Example 5.1.

Lip cancer in Scotland; this example was originally analysed in Clayton and Kaldor (1987), with further comment and analysis in Clayton and Bernardinelli (1992) and in Breslow and Clayton (1993a).

  • spatial units are the counties of Scotland;

  • the response from each county i is Yi, the total number of cases during the years 1975-1980 inclusive;

  • let Ri denote the risk for county i, and Ni the size of the population at risk

  • a natural model to fit to the data is that

    Yi|Ri𝖯𝗈𝗂𝗌𝗌𝗈𝗇(NiRi)
  • an available covariate is xi, the percentage of the population of county i who are engaged in agriculture, fishing or forestry

  • stronger predictors of lip cancer would be tobacco and alcohol consumption, but these are not available

Figure 5.3: Link, Caption: Showing the counties of Scotland - we will analyse the Scottish lip cancer data in the labs.

To model residual spatial variation in risk, after adjusting for the available covariate, we assume that

logRi=α+βxi+Si

where the Si are spatially correlated random effects that follow a discrete spatial variation model in which:

  • two counties are neighbours if they share a common boundary

  • the full conditionals of county i depend only on the neighbours of county i

  • Si|neighbours N(mi,vi) where

    mi= mean of Sj from counties j which are neighbours of county i;

    vi=σ2/ni, where ni= number of neighbours of county i

Note that this specification (an example of a conditional autoregressive (CAR) model) corresponds to an improper joint distribution for (S1,,Sn), with joint pdf

f(s1,,sn)exp{-ij(si-sj)22σ2}

where ij indicates that counties i and j are neighbours.