Spatial uncertainty, many caveats and very few conclusions
Posted on
![almost black image](https://cisweb.lancaster.ac.uk:443/img/cwip/cisweb.lancaster.ac.uk/EventsMedia/uncertainty-637246259544984925.jpg?mode=crop&width=874&height=289¢er=0.50%2c0.50)
“Experience shows that neither the scientific community nor decision makers have a good record at understanding uncertainty associated with predictions. Such understanding is necessary because the decision making process is best served when uncertainty is communicated as precisely as possible, but no more precisely than warranted.”(Pielke Jr., 2001)
The previous blog by Kate Wright (https://www.lancaster.ac.uk/data-science-of-the-natural-environment/blogs/covid-19-who-should-we-trust) provided definitions and an overall overview of uncertainty, its sources and its impact (please read it if you have missed it). The difficulties in estimating uncertainty is due to its subjectivity, and hence in providing objective measures of it. As Brown and Heuvelink stated in 2008 “uncertainty is an expression of confidence about what we know, both as individuals and communities of scientists, and is, therefore, subjective.”. Therefore, it reflects lack of complete knowledge about the process, the quality of the data and the choice of the inferential and estimation methods (model structure, parameters, etc...). Due to the subjective nature of uncertainty, often low uncertainty in the estimations does not imply low uncertainty in the process and its modelling (Isaaks and Srivastava, 1989). However, acquiring more data and improving the understanding of the process system and dynamics are key for the reduction of the uncertainty in all the process and model components; although the Heinsenberg uncertainty from spectral analyses taught us that achieving a high degree of accuracy in all the model components while minimizing bias is impossible since bias and variance are antagonistic (Priestley, 1981).
Spatially, uncertainty depends on the number and proximity of the samples, the spatial arrangement of the samples, and the nature of the phenomenon under study (obviously “if we are dealing with an extremely smooth and well-behaved variable, our estimates are going to be more reliable than if we are dealing with a very erratic variable” (Isaaks and Srivastava, 1989)). This is because the attribute/process under study may vary from one location to the other due to endogenous and exogenous factors. Sources of spatial uncertainty include availability and quality of data, imperfect detection and classification, its confidentiality, geocoding rate, support or area, border effects and scale (through aggregation, disaggregation, dominant patterns, process controls) (Atkinson and Graham 2006). Spatial uncertainty can be heterogeneous in space and time due to various changes or disturbances: environmental (including anthropogenic), climate-trend, climate-seasonality, social and political, demographic, and in case of diseases, epidemiological (behaviour, transmission, pathogen/vector adaptation, response, mutations). Spatial heterogeneities produce bias in the inferential, validation and predictions steps of any model.
Historically, the main approaches to estimate uncertainty are probabilistic and non-probabilistic methods, such as comparison between inputs and outputs (by linear correlation coefficient, partial correlation coefficients, standardized regression coefficients; their non-linear and monotonic variants and non-linear and non-monotonic alternatives; mean error, mean absolute error etc.), estimation intervals (interquartile differences, confidence intervals and credible intervals), posterior probabilities, bootstrapping and permutations, Monte Carlo methods (bootstrapping fails when there is no empirical distribution to sample from or when spatial heterogeneities are present – see below), classification indices (sensitivity, specificity, weighted kappa, expert opinion, ROC and AUC ), error variance (standard output in classical geostatistical analyses), sensitivity analysis, cross-validation, entropy, sandwich methods, robustness indexes (i.e. signal-to-noise ratio and the ranked probability skill score), scenario comparison and Beta-pert probability distribution methods.
Uncertainties about the precise value of an attribute at a particular point in space and time may be quantified with a marginal probability density function for that attribute; or in the case of the precise value of two or more attributes occupying the same space-time point, a joint probability density function may be employed. Since each sub-sampling is independent from the previous and all the observations have the same probability to be sampled, bootstrapping is not appropriate in the case of spatial heterogeneity (Whitcher, Tuch et al. 2008). More difficult and partly unresolved is the quantification of uncertainty for discrete or categorical data affected by spatial autocorrelation. In this case indicator kriging, conditional probability networks, Bayesian Maximum Entropy, Markov Random Fields and marked point processes are the most advantageous techniques (Brown and Heuvelink, 2008).
From all above, it is clear that uncertainty depends also on us and our choices not only on the data. In fact, errors in the data can be minimized by different means, for example using optimal interpolation techniques and appropriate sampling density; removing or check for outliers, subgrouping, removing systematic bias; adopting appropriate classifications; and improve model calibration by reducing errors in model parameters.
Luckily methods exist to estimate uncertainty (deterministically or not), although none of them can take into account all the uncertainties raising from the process/model components, and even more difficult is to maintain identifiability in the uncertainties of the model components. Only the complete knowledge of the process and its variations can successful minimize the uncertainty. Bayesian models guarantee implementation and control of uncertainty, and metascientifically could provide a way to control for our sphere of belief/bias, although this sounds like another Skolmen’s paradox (https://plato.stanford.edu/entries/paradox-skolem/)
Cited literature.
Atkinson, P. M. and A. J. Graham (2006). "Issues of scale and uncertainty in the global remote sensing of disease." Advances in Parasitology, Vol 62 62: 79-118.
Brown, J. D. and G. B. M. Heuvelink (2008). On the identification of uncertainties in spatial data and their quantification with probability distribution function. The handbook of Geographic Information Science. J. P. Wilson and A. S. Fotheringham. Malden, MA, Blackwell Pub.: 94-107.
Isaaks, E. H. and R. M. Srivastava (1989). Applied geostatistics. New York, Oxford University Press.
Pielke Jr., R. A. (2001). Models in ecosystem science. 9th Cary Conference, Millbrook, NY, Princeton University Press.
Priestley, M. B. (1981). Spectral analysis and time series. London; New York, Academic Press.
Whitcher, B., D. S. Tuch, J. J. Wisco, A. G. Sorensen and L. Q. Wang (2008). "Using the wild bootstrap to quantify uncertainty in diffusion tensor imaging." Human Brain Mapping 29(3): 346-362.
Related Blogs
Disclaimer
The opinions expressed by our bloggers and those providing comments are personal, and may not necessarily reflect the opinions of Lancaster University. Responsibility for the accuracy of any of the information contained within blog posts belongs to the blogger.
Back to blog listing