Using (7.1) as the model of interest, our ultimate goal is to predict the process at locations where we do not have data. In the mining industry (the origin of geostatistics), this concept has obvious utility: based on core samples extracted from a set of locations, where should we set up our mine (a costly and time-intensive operation) so it will be most productive? In environmental epidemiology, we are often concerned with estimating the burden or risk of disease: similar concepts can be applied.
Spatial prediction (or kriging) via variogram estimation (as opposed to maximum likelihood estimation) usually involves at least four stages: (i) produce initial estimates of using ordinary least squares (OLS) regression (ii) produce estimates of , and (iii) re-estimate using EGLS (iv) produce kriged estimates of . Step (iii) is optional, but if we are interested in making inference about the parameters, , then we do really need to take account of the fact that the observations are not independent (which is assumed by OLS) before presenting the results of our analysis. Steps (ii) and (iii) can also be iterated.
One of the main ingredients for producing spatial predictions of is an estimate of the surface and its variability at locations where we do not necessarily have data. In order to obtain estimates of and we first estimate the second order properties of . The variogram is an exploratory tool for doing this; it is also used in longitudinal data analysis.
The variogram is defined as:
For a stationary process, this quantity (known as the semivariance) can be estimated as
for each distance , where is the covariate data for the th individual and has been computed using OLS (for example). The resulting point estimates i.e. , computed using all possible pairs , can be plotted against as a ‘variogram cloud’ (Figure 7.2, left plot), or averaged for similar values of to produce a ‘binned variogram’ (Figure 7.2, right plot). In this example, we have not used covariates, so Figure 7.2 shows the variogram for
The variogram helps to provide plausible initial estimates of the parameters of the process, as it can be shown that
Fitting the stationary Gaussian model with exponential correlation function using ordinary least squares regression, we obtain:
is known as the variance parameter, as the range parameter, as the measurement error parameter or nugget effect, and as the sill.
Figure LABEL:mgvarianceplot.pdf below shows the shape of the fitted covariance function, i.e. with parameters set at the ordinary least squares estimates.
Exam Question 2014
Suppose that it is desired to fit the following model to a set of geostatistical data
where are measurements at locations ; is a vector of covariate information at each ; and is the value of a zero-mean second order stationary spatial Gaussian process at . Suppose has an exponential correlation function with variance parameter and spatial decay parameter . Figure 7.4 shows the variogram used to produce the estimates , and used to create . Use this plot to suggest estimates for , and stating any results you use.
Solution:
The variogram helps to provide plausible initial estimates of the parameters of the process as it can be shown that , where is the correlation function. Since tends to zero as , the sill is an estimate of and since , the intercept is an estimate of , hence from the plot , (the intercept). For choose a point on the line and note the value of and , rearranging the above, we get with and , this gives (the exact answer is 293).