Following on from our first successful workshop in 2023, we have identified the following as particular areas of interest to focus on in 2024. These topic areas aim to define topics that are of broad interest to environmental and ecological statisticians and quantitative scientists and may form the start of useful collaborations.
Statistics within the data science pipeline
Statisticians are increasingly working within an end-to-end data science pipeline that may involve complex tools at each stage, from data access to visualisation. Is there a requirement for statisticians and scientists to interact with other data science professionals to effectively analyse environmental and ecological data and how can effective partnerships be supported?
High dimensional data
Environmental and ecological data, particularly that derived from modern technologies such as DNA sequencing or remote sensors, can have a large number of potentially informative dimensions. A challenge for statisticians and scientists is to extract meaningful signals from high-dimensional data. Have new statistical approaches increased our ability to model this data? Where is more research needed?
AI and statistics
The use of AI in research and teaching has expanded hugely in the last few years, particularly with the development of large language models. How can environmental and ecological statisticians benefit from advances in AI? Where can AI learn from statistics?
Communicating statistics effectively
Communicating statistical concepts well is increasingly important and challenging as analyses become more complex. How can we learn from other disciplines to effectively communicate with stakeholders and the public? Is co-design important to enable impactful communication and how can this be integrated into quantitative research?
Philipp Boersch-Supan (British Trust for Ornithology)
For decades our understanding of bird (and other wildlife) populations has been grounded in data created by human observers. Now, new technologies, such as acoustic recorders and computer vision systems, offer additional ways of observing and recording wildlife. They offer the potential to expand monitoring across space and time in ways that are difficult or impossible to achieve with human observers. However, the way that sensors perceive birds is in many ways fundamentally different from the human experience. I will explore the seemingly simple questions "How many birds are there?", and "How high do birds fly?" which are at the heart of understanding how bird populations fare in the anthropocene, and how birds interact with human made infrastructure. Answering these questions needs to take account of imperfect sampling and/or measurement errors, which are a feature of both human and sensor-based observation approaches. Carefully constructed observation models are a key requirement for drawing sound inferences from such data, and for meaningful data integration across different observation approaches. I will present recent work on such models in the context of population trend estimation and environmental impact assessments for offshore wind farms, and outline research priorities for improving statistical frameworks for data arising from complex and imperfect observation processes.
Amanda Trask (Institute of Zoology, Zoological Society of London)
Many threatened species rely on some component of ex-situ management, ranging from Extinct in the Wild species that are entirely under human care, to species where some individuals may be taken into ex-situ management at key life-stages before re-release. Further, with increasing anthropogenic pressures on the natural world, it is likely that the need for ex-situ management will increase into the future. However, many ex-situ populations may have low viability, potentially compromising their ability to fulfil their conservation objectives, such as acting as insurance against species decline in the wild or as a source for releases back to the wild. Conservation breeding institutes often keep extensive records of animals in their collections, such as studbook data, health records and necropsy reports. Analyses of this data has promise to offer insights into underling causes of low viability, but these analyses can still come with challenges such as from missing data (e.g. when animals are transferred between institutions), inconsistent recording, or recording only some life-stages. Here, I will use work on the Extinct in the Wild sihek (Guam kingfisher, Todiramphus cinnamominus) as a case study of how understanding underlying causes of low viability is important to inform conservation strategies and how collaboration between conservation biologists and statisticians can make better use of available data, to aid species recovery efforts.
Jon Barry (CEFAS)
This talk addresses the fundamental problem that counts of categories (e.g. species) from ML or other classification algorithms can be biased in the presence of classification error. We develop and apply a Bayesian model to account for errors in classification algorithms used for environmental monitoring. We illustrate our methods with data from a ML algorithm to identify zooplankton in the Celtic Sea and English Channel, U.K.
Lynne Seymour (Department of Statistics, University of Georgia, US)
We use the Morse filtration from topological data analysis to develop the seahorse and horse-Z plots for investigating waves in the atmosphere, as defined by geo-potential heights. The seahorse gives a summary of the wave features over a set of fixed latitudes, while the horse-Z assesses the waves and their heights against a historical record. We present some case studies focused on extreme heat events to show how this tool might be used.
Abdollah Jalilian (LU)
We present a space-time modelling approach to understand the dynamics of clinically confirmed malaria cases across zones of Ethiopia. More specifically, a negative binomial regression model with fixed effects for environmental and climate variables and temporal and spatially structured random effects is employed to describe variations in the data. This modeling approach reveals that the residual seasonality, unexplained by the environmental and climate covariates, shows zone-specific patterns. A cluster analysis is further conducted to cluster different zones in Ethiopia into 4 clusters based on similarity of their seasonal behaviour.
Claudie Beaulieu (University of California Santa Cruz)
Quantifying global climate change and its impacts on ecosystems is challenged by the complexity and limitations of environmental data. In marine ecosystems, detecting climate change impacts on ocean chlorophyll-a (CHL), a proxy for primary productivity, is hindered by the shortness of the record and the long timescale of memory within the ocean. As a result, time-series analysis of satellite chlorophyll is still inconclusive as to the sign of change in some regions. Here I show how utilizing both temporal and spatial dependency in the available data through a Bayesian hierarchical space-time model reveals the full uncertainty in chlorophyll trends and highlights regions undergoing significant change. We further optimize the model by partitioning the ocean in dynamic optical classes regions. Optimal detection of trends is especially important for essential climate variables with limited record length, such as satellite CHL. More generally, this work provides a framework for quantifying trends and associated uncertainty in climate change studies.
Rachel McCrea (LU)
Survival probabilities are likely to be dependent on characteristics of individuals within a population. In order to model this individual heterogeneity covariates are often collected when undertaking a capture-recapture study. These covariates may be the same over the entire period of study, for example sex or birth weight, or may vary over time, e.g. weight, length or fitness.
The challenge with including individual time-varying covariates in a standard capture-recapture model is that you only record them when you capture an individual, so you have missing data when individuals are not captured. In addition on some instances when an individual is captured you may not record the covariate, thus resulting in missing observations.
There are a number of ways to tackle this challenge. Within this presentation I will provide an introduction to the trinomial capture-recapture model of Catchpole et al (2008) which allows a straightforward way of incorporating individual time-varying covariates. Then I will present a multistate version of this conditional likelihood approach and demonstrate its performance using simulation. I will provide some results of analysing Sihek data (which Amanda has introduced earlier in the workshop) using this new multistate model.
April Shengjie Zhou (LU)
The reserve site selection (RSS) problem aims to select a subset of sites from a set of potential locations to assemble a reserve that achieves conservation goals. This problem is typically formulated as a mathematical optimisation problem. However, the use of simulation optimisation (SO, also known as simulation-based optimisation) in RSS is very limited due to the extensive simulations required to evaluate a large number of potential solutions, making it computationally challenging. This talk presents two approaches to mitigate the computational intensity of solving the RSS problem using SO, offering different perspectives on designing computational effort reduction method.
Maria Zari (HR Wallingford)
It is a common practice in the designing flood mitigation risk analysis to use simplified and deterministically based design events which rely on a single storm. However, recent analysis has highlighted that flood events occurring over the last 10-15 years have arisen as a result of clusters of storms and/or atmospheric rivers. These relatively frequently occurring events give rise to complex flooding mechanisms that comprise multi-peaked hydrographs and can last over a period of a number of weeks. These common practice techniques however do not always capture the complex reality of these flood events. This presentation will outline a multivariate extreme value statistical model that aims to investigate the impacts of these events and capture the spatial and temporal dependence of extreme river flows on multiple tributaries and sea levels within the Humber Estuary on the North East coast of England.
Pete Henrys (UKCEH)
In this talk I will present an overview of the key considerations when designing environmental monitoring schemes and the challenges that can be encountered. This will lead on to exploration of some of the limitations with existing designs and where they may not be particularly efficient. I will then introduce the concept of adaptive sampling, highlighting examples of both empirical adaptive approaches and more novel model-based adaptive methods. The four key stages will be set out and I will then walkthrough an example of adaptive sampling in practice. I will end by reviewing the key challenges and barriers to uptake and suggest potential ways forward.
Fiona Seaton (UKCEH)
Species distribution modelling is a highly used tool for understanding and predicting biodiversity change, and recent work has emphasised the importance of understanding how species distributions change over both time and space. Spatio-temporal models require large amounts of data spread over time and space, and as such are clear candidates to benefit from model-based integration of different data sources. However, spatio-temporal models are highly computationally intensive and integrating different data sources can make this approach even more unfeasible to ecologists. Here we demonstrate how the R- INLA methodology can be used for model based data integration for spatio-temporally explicit modelling of species distribution change. We demonstrate that this method can be applied to both point and areal data with two contrasting case studies, one using the SPDE approach for modelling spatio-temporal change in the Gatekeeper butterfly (Pyronia tithonus) across Great Britain and the second using a spatio-temporal areal model to describe change in caddisfly (Trichoptera) populations across the River Thames catchment. We show that in the caddisfly case study integrating together different data sources led to greater understanding of the change in abundance across the River Thames both seasonally and over 5 years of data. However, in the butterfly case study moving to a spatio-temporal context exacerbated differences between the data sources and resulted in no greater ecological insight into change in the Gatekeeper population. Our work provides a computationally feasible framework for spatio- temporally explicit integration of data within SDMs and demonstrates both the potential benefits and the challenges in applying this methodology to real ecological data.