1 Lab Session 4 Lip cancer in Scotland Overlay Operations, Mixing spatstat and sp Operations

A Geostatistical Model

This section demonstrates how to fit the geostatistical model to the magnesium soil data example discussed in lectures. The data for this example comes from the geoR package; we will use the gstat package to analyse them. Run the following lines if necessary (try typing library(geoR) first, for instance).

install.packages("geoR", dependencies = TRUE)
install.packages("gstat", dependencies = TRUE)

Then load them into R:

library(geoR)

Error in library(geoR): there is no package called ’geoR’

library(gstat)

Next we read in the data

data(camg)

Type head(camg) to look at the top few rows of the data. We can convert this object into a SpatialPointsDataFrame object (from the package sp) as follows:

coordinates(camg) = ~east + north

Error in coordinates(camg) = ~east + north: object ’camg’ not found

The SpatialPointsDataFrame is another example of an S4 class of objects, it contains both information on the locations of points as well as having a data.frame containing covariate information for each point. We will examine the magnesium concentration in the 0-20cm soil layer, mg020. You can plot the locations of the data, as well as specific attributes of the data as follows:

plot(camg)

Error in plot(camg): object ’camg’ not found

spplot(camg, "mg020")

Error in spplot(camg, "mg020"): object ’camg’ not found

Let $Y_{i}$ denote the magnesium concentration at location $x_{i}$ and $S$ be a spatial Gaussian process with Exponential correlation function. We aim to fit the following two models to the $\{Y_{i}\}$ :

	$\displaystyle Y_{i}$	$\displaystyle\sim$	$\displaystyle\mathrm{N}[S(x_{i}),\tau^{2}],$		(1.1)
	$\displaystyle Y_{i}$	$\displaystyle\sim$	$\displaystyle\mathrm{N}[Z(x_{i})\beta+S(x_{i}),\tau^{2}],$		(1.2)

Model (1.1) is the simplest of the two; in this model, we model the magnesium concentration directly as a noisy realisation of the spatial Gaussian process, $S$ . In model (1.2), the inclusion of spatially referenced covariate effects , $Z(x_{i})\beta$ , represents the explained variation in the prediction surface, and the interpretation of $S$ in this case is the unexplained variation.

The first step is to estimate the variogram for these data:

mgvg <- variogram(mg020 ~ 1, data = camg)

Error in variogram.formula(mg020 ~ 1, data = camg): object ’camg’ not found

and plot it:

plot(mgvg)

Error in plot(mgvg): object ’mgvg’ not found

Now next find the MLEs for the parameters in the stationary Gaussian model:

mgvgfit <- fit.variogram(mgvg, model = vgm(model = "Exp",
    psill = 30, range = 200, nugget = 10))

Error in fit.variogram(mgvg, model = vgm(model = "Exp", psill = 30, range = 200, : object ’mgvg’ not found

mgvgfit

Error in eval(expr, envir, enclos): object ’mgvgfit’ not found

and add them to the plot

plot(mgvg, mgvgfit)

Error in plot(mgvg, mgvgfit): object ’mgvg’ not found

The help file for the vgm function above does not make clear the exact role of the arguments when they are used inside the function fit.variogram, however according to Bivand et al. (2013) these are initial values for the variogram fit. In this case, my choice of psill=30, range=200, nugget=10 was based on inspection of the empirical variogram, but the initial values don’t appear to be too sensitive in this case.

Exercise 1.4.

Try checking with several different initial values, has the procedure achieved convergence?

We are now in a position to be able to do the kriging, but before we begin, we’ll construct an output grid onto which we will predict the surface. This will be a regular grid covering the observation window, and this will be stored using an sp object of class SpatialPixels.

bb <- bbox(camg)  # bounding box

Error in bbox(camg): object ’camg’ not found

delta <- 20
xseq <- seq(bb[1, 1], bb[1, 2], by = delta)

Error in seq(bb[1, 1], bb[1, 2], by = delta): object ’bb’ not found

yseq <- seq(bb[2, 1], bb[2, 2], by = delta)

Error in seq(bb[2, 1], bb[2, 2], by = delta): object ’bb’ not found

predictgrid <- SpatialPixels(SpatialPoints(expand.grid(xseq,
    yseq)))

Error in expand.grid(xseq, yseq): object ’xseq’ not found

We perform the kriging step as follows:

mgkrige = krige(mg020 ~ 1, camg, predictgrid, model = mgvgfit)

Error in krige(mg020 ~ 1, camg, predictgrid, model = mgvgfit): object ’camg’ not found

This produces an object of class SpatialPixelsDataFrame, which contains the predictions and marginal prediction variances.

spplot(mgkrige, "var1.pred")

Error in spplot(mgkrige, "var1.pred"): object ’mgkrige’ not found

spplot(mgkrige, "var1.var")

Error in spplot(mgkrige, "var1.var"): object ’mgkrige’ not found

These predictions are marginally (in fact they are jointly) Gaussian, so we can also produce maps of cell-wise exceedances, for example showing $\mathbb{P}[\texttt{mg020}>30]$ or $\mathbb{P}[\texttt{mg020}>30]>0.8$ . Such maps are more useful to scientists than maps of the prediction surface $S(x)$

Exercise 1.5.

Produce a map illustrating $\mathbb{P}[\texttt{mg020}>30]>0.8$ .

Note that although we have been using an exponential covariance model to estimate a parametric form for the variogram, the methods illustrated above make no mention of the common parameters for this model, $\sigma^{2}$ and $\phi$ . Recall from the notes that the variogram can be written as:

V(u)=\tau^{2}+\sigma^{2}\{1-\rho(u)\},

where $\sigma^{2}$ is the variance parameter, $\phi$ is the range parameter, $\tau^{2}$ as the measurement error parameter or nugget effect, and $\sigma^{2}+\tau^{2}$ as the sill. So we can get a handle on these parameters from the object mgvgfit.

mgvgfit$psill

Error in eval(expr, envir, enclos): object ’mgvgfit’ not found

mgvgfit$range

Error in eval(expr, envir, enclos): object ’mgvgfit’ not found

tau2 <- mgvgfit$psill[1]

Error in eval(expr, envir, enclos): object ’mgvgfit’ not found

sigma2 <- mgvgfit$psill[2]

Error in eval(expr, envir, enclos): object ’mgvgfit’ not found

phi <- mgvgfit$range[2]

Error in eval(expr, envir, enclos): object ’mgvgfit’ not found

And plotting just to confirm that this is indeed correct:

x <- seq(0, 500, length.out = 1000)
y <- tau2 + sigma2 * (1 - exp(-x/phi))

Error in eval(expr, envir, enclos): object ’tau2’ not found

plot(mgvg$dist, mgvg$gamma, ylim = c(0, 45), xlim = c(0,
    450))

Error in plot(mgvg$dist, mgvg$gamma, ylim = c(0, 45), xlim = c(0, 450)): object ’mgvg’ not found

lines(x, y, col = "red")

Error in xy.coords(x, y): ’x’ and ’y’ lengths differ

Exercise 1.6.

Complete the following exercises.

1.

Plot the predicted surface $S(x)$ using different parameter values. How does this change according to $\sigma^{2}$ and $\phi$ ?
2.

Repeat the analysis above, including elevation and region (in the camg data-set) as covariates in the model. To do this, you will need to re-estimate the variogram using the variogram function and then re-estimate the parameters of $S$ using fit.variogram.