Our working model for spatial variation in risk is that:
cases form a Poisson process with intensity
controls form a second, independent, Poisson process with intensity
where
is determined by the number of controls in the design
represents spatial variation in risk
It follows that, conditional on both case and control locations:
case/control labels are determined by a series of independent Bernoulli trials with success probabilities
spatial variation in risk is estimable up to a constant of proportionality
We revisit this topic in Section 6.
Kelsall and Diggle (1998) consider three approaches to nonparametric estimation:
We now look at each of these methods in detail.
This section could arguably be renamed “intensity ratios”, as the main idea is to look at the quotient of nonparamteric estimates of the intensity of the process describing the cases and the process describing the controls. However, the theoretical framework for the methods in this section derives from nonparametric density estimation (i.e. the estimation of probability density functions), so we stick with the term “density ratios”.
This method first uses kernel smoothing for separate estimation of and , then uses these estimates in the expression to estimate .
The basic problem addressed by the kernel smoothing method is to estimate an unknown probability density function from a observed sample of data. This problem is very familiar if one is prepared to assume a parametric form for the density function, say, as then all that is required is to estimate the parameters of , for example using maximum likelihood estimation.
It often occurs, especially in estimating unknown intensity functions, that it is not practical to assume a particular parametric form. In that case, non-parametric or semi-parametric methods are required.
Consider a sample from an unknown univariate density function . A simple but rather unsatisfactory method is to treat as a discrete probability distribution, resulting in the empirical distribution function:
A natural improvement is based on histogram methods. Note that the bandwidth (i.e. width of each bar) has an impact on the appearance of the estimate of .
Kernel smoothing improves further on these simple estimates by treating the underlying distribution as having a continuous, smooth density function. Specifically, we estimate by
Here, is a symmetric (even) function, non-increasing in , known as the kernel, and is a scalar which determines the amount of smoothing. Each data-point contributes to the estimate of at each possible value of , with the largest contributions coming from data-points close to .
Possible choices for include the standard Gaussian density function, the uniform density function on a specified interval, and the Epanechnikov kernel:
Another way of looking at the effect of changing appears in the graphs in Figure 3.5, for which the data are a random sample of size from the standard Gaussian distribution.
One method for bandwidth determination is least-squares cross-validation, see Wand and Jones (1994). We would like to choose to minimise the integrated mean square error
We cannot do this directly, as is unknown, but instead we can estimate the part of this function that depends on ,
by
where
is the so-called ‘jackknife’ estimate of , obtained using all except the th data-point. Then choose to minimise . Stone’s theorem (Theorem 3.1 below) guarantees (under certain conditions) that, in large samples, the estimate obtained using cross-validation converges to the value that minimises . Further technical details are provided by Scott and Terrell (1987).
Stone’s Theorem. (NOT EXAMINABLE) Suppose is bounded. Let denote the kernel estimator with bandwidth and let denote the bandwidth chosen by cross-validation. Then
almost surely.
In a spatial setting, the method is similar. The are now case or control locations (treated separately), and is a circularly symmetric bivariate density function. The same value of is usually used to estimate both and .
Since , we have
.
An estimate of the probability that a given individual at location is a case is
.
For cases and controls, let for case/control respectively at location .
Define weights,
Kernel estimator is
and so
As in the kernel smoothing method, the amount of smoothing is determined primarily by the bandwidth, . Kelsall and Diggle (1998) recommend cross-validation to choose , similar to the cross-validation method described above. For binary regression, with no explanatory variables, this is defined as follows:
for each , let be the estimate of using all data except ;
choose to maximise the cross-validated likelihood, or equivalently minimise (minus the cross-validated log-likelihood), defined as
Jarner et al. (2002) develop the modifications needed to analyse matched case-control data. Even though in general the matched design is not recommended when spatial variation is of scientific interest, if a matched design is used the method of analysis must respect the design.
If explanatory variables are included, the generalized additive model (below) can be used, which uses a similar idea by incorporating a cross-validated likelihood step in order to estimate residual risk after estimating the effects of covariates on .
Lung and stomach cancers in Walsall (from Kelsall and Diggle (1998)).
It can be seen that there are some minor differences in the pattern of variation for both diseases, but broadly it is similar.
Generalized additive models (GAMs) are an extension of generalized linear models (GLMs) that allow a more flexible functional relationship between covariates and the outcome of interest. In GLMs, this relationship is typically assumed to be linear, or even if not, must be specified in advance. In much the same way as the kernel smoothing methods described above, GAMs allow the form of this relationship to be estimated as part of the model-fitting procedure.
The model assumption is
where is vector of known risk factors and is a function that models smooth residual spatial variation.
In a spatial setting, is a geographical location, but terms equivalent to could also be included to investigate the nature of measured covariates, both in spatial and non-spatial regression analyses.
We have assumed a logit link, , canonical for the Bernoulli distribution. As for GLMs, other link functions could also be specified, and could be replaced by for greater generality.
We have , and the log-likelihood is of the form
Details of the fitting algorithm are available in Hastie and Tibshirani (1990), and include a kernel smoothing step for within iteratively weighted least squares. The method is implemented in the R
library mgcv
.