Data Science of the Natural Environment and epidemic modelling?
Posted on

The Royal Society have coordinated a call for assistance from the scientific modelling community, to support epidemic modellers in modelling the COVID-19 pandemic. DSNE is a project with modelling at its heart, so we have of course volunteered our services. Several of DSNE’s skills and techniques are central to both environmental data science and epidemic modelling:
Spatio-temporal statistical methods, for making sense of data that is observed with both a location and a time stamp. In an environmental context, this might be temperatures measured across an ice sheet through a period of days. In the epidemics context, illnesses are recorded at both a location and a time. Understanding the underlying process driving such observations is critical to be able to reveal what is actually going on.
Gaussian process emulation of complex models. Both environmental and epidemic modelling make extensive use of simulations of the world. As well as traditional differential equation based models, we use stochastic (random) simulators, and agent based models where each individual in a population is simulated independently. The full simulations can be incredibly time-consuming to run. By building a quick-to-run predictor of the outputs of the simulator for any particular input parameters, called an emulator, we can save a lot of computational effort (and hence time) while exploring plausible future outcomes.
Combining observations with large models. Directly combining climate models with observations is central to many of the challenges we face in DSNE, just as in epidemic modelling. We even have similar challenges of messy and missing data, selecting between (or combining) multiple different models, and multiple data sources.
Decision-making under uncertainty. Of course, all the epidemic modelling in the world is only useful if the outputs are used to help guide decision-making. All of the inference efforts we make in DSNE are guided by whether they might be useful for making decisions. We may want to use reinforcement learning, Bayesian optimisation, or other machine learning strategies to find optimal decisions for our emulated models, then refine them in the full simulator, before finally deploying them in the real world. And of course monitoring whether the world responds in the way we expect given our model predictions, and adjusting our strategy accordingly.
Sharing the scientific process. In both our usual domain of work, and in pandemic modelling, it is important to effectively collaborate. Effectively sharing data, methods and computing code is critical to the collective scientific process. The virtual labs DSNE are developing to facilitate this in environmental data science should be extremely useful in collective efforts to model the epidemic, minimising the learning curve for people adopting unfamiliar data science methods.
Overall, while producing our response to the call, I was struck by quite how much overlap there is between the challenges of environmental data science and the challenges faced by the epidemic modelers. I hope we can be useful if called upon to help.
Related Blogs
Disclaimer
The opinions expressed by our bloggers and those providing comments are personal, and may not necessarily reflect the opinions of Lancaster University. Responsibility for the accuracy of any of the information contained within blog posts belongs to the blogger.
Back to blog listing