In this project you will model the spatial distribution of a heavy metal of your choice in the state of Texas, USA. To load the data for this project into R simply type:
library(sp) library(spatstat) library(rgdal) library(gstat) load("CHIC465.RData")
The R data file contains two objects:
A SpatialPointsDataFrame giving the heavy metal content in parts per million at each sample location. Note that these samples were primarily taken from stream sediment (wet and dry), but a good number were also taken from soil with only a few taken from other media. These data are also in projection coordinates EPSG:32614.
A shapefile of the state of Texas in projection coordinates EPSG:32614.
Details of the variables in the data are available here: http://mrdata.usgs.gov/geochem/about.php and there are a set of frequently asked questions here: http://mrdata.usgs.gov/metadata/ofr-2004-1001.faq.html.
Choose one of the heavy metals from: chromium, copper, nickel, zinc, arsenic, mercury or lead and perform the following tasks:
For your chosen heavy metal, perform a literature search on the potential health risks associated with that metal. You should consider the following questions: What are the health risks associated with your chosen metal? What evidence is there in the academic literature that supports a link between exposure to your chosen metal and disease? Is there any guidance on maximum exposure levels? What are the typical sources of exposure to the metal?
Write 100020% words discussing what you have found out about this. [50% of Marks]
Using techniques of geostatistics you have learned on the course, produce a map of the concentration of your chosen metal over Texas. You should produce variograms, and compare the results from several different potential models for the correlation function. For the points on your prediction grid, you can calculate exceedance probabilities
for some threshold . Produce maps of exceedance probabilities (there is an example in the fourth workshop) for some appropriate threshold , preferably justified by your literature search. See the hint below on kriging on the log-scale.
Using figures and tables where appropriate, write a discussion of your model-building strategy and spatial predictions. If possible, link this discussion to your findings in part 1. [50% of Marks]
This exercise may be easier for some metals than others.
As a starting point, you might consider looking at information from the Environmental Protection Agency’s website: http://www.epa.gov/ (which can be searched using Google). Having found out some background to your chosen metal, then proceed to a formal database search (e.g. Web of Science/Medline/Google Scholar). Make sure there is enough literature on your chosen metal, before you continue; it might also be worthwhile doing some exploratory analyses of the data to see what choice of metal might make an interesting report (a metal that has a more interesting spatial patterns over Texas may make for a more interesting discussion).
Since concentration is a positive-valued quantity, should should krige the log of concentration instead. The exceedance probabilities can also be calculated on this scale; the procedure is as follows: (i) log the concentration at each point; (ii) use kriging to produce predictions of the log-concentrations onto your prediction grid; (iii) use the standard errors of these predictions to compute the upper tail probability that the concentration exceeds your chosen threshold at each point (remember to convert your threshold onto the log scale).
The grid you use in the spatial prediction step must be in the same geographic projection as the data i.e. in EPSG:32614, a Universal Transverse Mercator coordinate reference system. To convert your SpatialPoints prediction grid, predgrid, say, into this projection, you will require the rgdal package, then proj4string(predgrid) <- CRS(proj4sring(state)) should do the trick.
Don’t worry if you find that the environment in Texas is perfectly safe!
Your report must be written in LaTeX and it must be appropriately referenced, preferably using BibTeX
Your report must not exceed 5 pages including tables and figures, but excluding references.
It should be written in 12 point font on A4 paper with 1 inch margins. Note that the easiest way to get the margins is to put \usepackage[margin=1in]{geometry}
in your preamble.
Include all your R code in an appendix (which does not contribute to the page count).
You must compile and submit your document as a .pdf file.
A 10% penalty will be applied to reports not confirming to the above criteria; they will not be marked beyond the 5th page.
The deadline for this assignment is Monday 6th April at 10:30 am.
The data for this project was obtained from the National Geochemical Database of the United States of America. http://mrdata.usgs.gov/geochem/.
U.S. Geological Survey, 2004, The National Geochemical Survey - database and documentation: U.S. Geological Survey Open-File Report 2004-1001, U.S. Geological Survey, Reston VA.