Let , be a real-valued random variable. We say that is a spatial Gaussian process if for any finite collection of points on the plane, , the joint distribution of the random vector is multivariate Gaussian.
Let be a Gaussian process. If in addition,
for all and
then is called a weakly stationary or a second-order stationary Gaussian process. In this chapter, we will refer to as simply a stationary Gaussian process and we will write,
where . is called the correlation function.
An example correlation function is the exponential correlation function
The parameter controls how quickly the spatial dependence between points at a distance apart drops off as a function of the distance apart: in a large value of represents long-range correlations and a very small means that points close together are nearly independent.
A typical model for measurements at locations , given covariate information, at each , is
(7.1) | |||||
(7.2) | |||||
where . Models of this kind are often called geostatistical models (Cressie, 1991). This is an indirect reference to their historical development in connection with spatial prediction problems in the mining industry.
Typical geostatistical problem: use data from locations to predict either itself, or a functional such as the average value of over region ,
where is the area of region . Note that prediction locations, , are often on a regular grid over the observation window of interest: this allows us to produce raster images of the prediction surface.
The dataset camg in the geoR package (having loaded the package, simply type data(camg)
at the console) is an example of a geostatistical dataset containing, among other things, the concentration of Magnesium in a set of soil samples taken over a geographical region. Figure 7.1 shows the magnesium soil content (0-20cm below surface) measured at various locations in a field.
Figure 7.1 is an example of a typical geostatistical dataset: we observe a quantity of interest (and possibly some covariates) at a set of locations. A key property of geostatistical data is that the locations of the data are fixed by design and are not informative about the underlying spatial process; although it is possible to perform inference for the case where the locations themselves are modelled by a stochastic process (Diggle et al., 2010; Taylor et al., 2015, 2018, 2019, 2020).
How do we go about fitting a geostatistical model? There are three main methods:
Use maximum likelihood to obtain estimates of , , and .
Use the variogram (see below) to obtain estimates of , , and .
Use Bayesian methods to obtain estimates of , , and : write down priors for these parameters and produce inferential statements from the posterior, . We do not cover Bayesian estimation for Geostatistical models here.
Having obtained estimates of the parameters, we can predict the process over the spatial region of interest (i.e. in places where we do not have data). Note that in the case that our model includes covariates at each spatial location , we would need to know the values of the covariates at each of the prediction locations in order to be able to predict (though predicting is still possible without covariates). The process of forming predictions of (or ) is known as kriging.