MARS PhD Projects
MARS is offering an exciting range of potential PhD projects in each of our four application areas as well as projects with methodological approaches that have multidisciplinary applications. The list is not prescriptive but gives an indication of what a MARS PhD project in a particular area could cover.
Please contact your prospective supervisor before you apply.
Apply to MARSCyber Security
accordion
Lead supervisor: Prof Bill Oxbury
A growing concern in cyber security is the potential for cyber attacks on AI systems [1], [2]. In this project, we’re specifically concerned with adversarial and data attacks on machine learning models at the core of such systems.
If we think of such a model as a function f(x; W) where x is the model input (e.g. pixel values in an image) and W denotes model parameters (e.g. weights of a neural network), then we are interested in fooling attacks (instabilities of f(x;W) under perturbations of x) and poisoning attacks (instabilities under perturbation of W).
The aim of this project is to explore these perturbations both theoretically and experimentally. In applications, both x and W occupy very low-dimensional manifolds in their respective (vector) spaces. (For example, real images are extremely rare in the space of all pixel values.) So an analysis might fall into two parts: first, understanding the stability of f(x; W) in the full space; and second restricting to data submanifolds and understanding what can be said about them and about the behaviour of f(x; W) on them.
The project will explore these questions, first by exploratory observation of the x,W landscapes for families of mode architectures and under suitable metrics. Based on these observations, the aim will be to draw theoretical conclusions and to prove bounds on vulnerability – for example as f(x; W) evolves in the course of a training regime.
[1] National Cyber Security Centre. Principles for the Security of Machine Learning. https://www.ncsc.gov.uk/collection/machine-learning-principles, 2022.
[2] Kui Ren, Tianhang Zheng, Zhan Qin, and Xue Liu. Adversarial attacks and defenses in deep learning. Engineering, 6(3):346–360, 2020.
Engineering
accordion
Lead supervisor: Prof Anthony Nixon
Many molecules such as proteins can flexibly change their rigid shapes that determine chemical properties. While AlphaFold predicted protein folding better than other methods [1], its underlying model remained a black box. To mathematically explain molecular properties, AI predictions should input invariant descriptors preserved under the following new equivalence. A flexible isomorphism of molecular graphs in 3-dimensional Euclidean space is a continuous transformation of vertex positions that maintains all edge lengths and some (but not all) angles between pre-determined edges. This flexible isomorphism captures the realistic flexibility of polymer chains or more complicated proteins. Inspired by recent developments in Geometric Data Science [2,3], the project aims to develop complete invariants of graphs that (1) can be efficiently inverted to a unique molecular structure up to flexible isomorphism, (2) are stable under perturbations of atoms in a suitable metric, and (3) are polynomial-time computable in the number of vertices. These invariants will enable a justified molecular design by active learning or navigation in the space of new invariants, which parametrise all graph classes under flexible isomorphism.
[1] J. Jumper et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 (7873), 583-589, 2022.
[2] D. Widdowson, V. Kurlin. Recognizing rigid patterns of unlabeled point clouds by complete and continuous isometry invariants with no false negatives and no false positives. CVPR 2023 (Computer Vision and Pattern Recognition), 1275-1284.
[3] O. Anosova, et al, Complete and bi-continuous invariant of protein backbones under rigid Motion, https://arxiv.org/abs/2410.08203
Lead supervisor: Dr Maciej Buze (contact mars@lancaster.ac.uk)
Modern high-throughput molecular and atomistic simulations rely on sophisticated optimisation tools to effectively explore the severely non-convex energy landscapes. Bifurcation theory-based techniques such as numerical continuation and deflation are well-developed mathematical approaches already used in a range of nonlinear settings (see e.g., [1], [2]), but remain underutilised in atomistic and molecular simulations [3]. Numerical continuation is a set of powerful methods to track solution paths of systems of nonlinear equations as parameters vary. It has been recently successfully applied in the study of crack propagation at the atomistic scale [4] and should be equally useful in a wide range of atomistic setups, as recently prototyped in studying dislocation nucleation in surface step systems in copper and vacancy migration in two-species atomistic systems. Deflation techniques were originally developed for finding distinct roots of polynomials – if we know that a polynomial p has a root at x0, then it is beneficial to seek other roots by considering the deflated function q(x) = p(x) x−x0 instead. In the last 10 years, these ideas were successfully translated to the context of finding distinct solutions of PDEs [2] – there the deflation happens through a deflation operator, e.g., M(u) = Id ku−u0k , where u0 is a known solution to the PDE. Some preliminary work [5] indicates such approaches will prove very powerful in the atomistic modelling of materials, provided that we exploit the structure of the system when deriving deflation operators M. One example of this is the idea of only taking into account input from a region of the domain that is of most interest (e.g. close to a defect core). In this project, on the theoretical side, we will derive and analyse atomistic deflation operators and study their suitability for exploring the potential energy surfaces of atomistic systems. On the practical side, we will work on algorithmic development of numerical continuation and deflation techniques as a robust wrapper around the go-to open-source molecular dynamics software, LAMMPS [6] and on applying them to a set of real-world test cases.
1] E. L. Allgower and K. Georg, Numerical Continuation Methods, Springer-Verlag (1990). [2] P. Farrell, A. Birkisson, and S. Funke. Deflation techniques for finding distinct solutions of nonlinear partial differential equations. SIAM Journal on Scientific Computing 37.4 (2015).
[3] S. Bagchi, I. Baghishov, M.Buze et al. New Mathematics for the Exascale: Applications to Materials Science - White Paper, Institute for Pure and Applied Mathematics, UCLA (2023).
[4] M. Buze and J.R. Kermode. Numerical-continuation-enhanced flexible boundary condition scheme applied to mode-i and mode-iii fracture. Phys. Rev. E, 103:033002 (2021).
[5] M. Noack and S. Funke. Hybrid genetic deflated newton method for global optimisation. Journal of Computational and Applied Mathematics 325, pp. 97–112 (2017).
[6] A.P. Thompson, et al. LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales." Computer Physics Communications 271 108171 (2022)
Lead supervisor: Dr Maciej Buze (contact mars@lancaster.ac.uk)
Minimisation diagrams, such as Voronoi tessellations, play a pivotal role across science and engineering, with applications ranging from modelling biological and chemical structures to mesh generation for finite element methods [1] and nearest-neighbour searches [2]. They have strong theoretical ties to optimal transport [3] and can be fitted to data using logistic regression, which underpins a lot of modern machinelearning methodology [4]. In this project, we will advance the theory and computational techniques for minimisation diagrams. The theoretical focus includes exploring links with unbalanced optimal transport theory. On the computational side, building upon prior work [5], we will develop and implement GPU-accelerated algorithms using PyTorch, with specific emphasis on spatially adaptable discretizations and gradual introduction of anisotropy. This project is in collaboration with Tata Steel, and we will apply the developed algorithms to simulate steel microstructure, bringing theory directly to practical, industrial applications.
[1] F. Aurenhammer. Voronoi diagrams — a survey of a fundamental geometric data structure. ACM Computing Surveys, 23(3), 345-405 (1991).
[2] S. Har-Peled and N. Kumar. Approximating minimization diagrams and generalized proximity search. SIAM Journal on Computing, 44(4), 944-974 (2015).
[3] Q. Merigot and B. Thibert. Optimal transport: discretization and algorithms. Handbook of Numerical Analysis, Vol. 22, pp. 133-212 (2021), Elsevier.
[4] K.P. Murphy. Machine learning: A probabilistic perspective. MIT press (2012).
[5] M. Buze, J. Feydy, S.M. Roper, K. Sedighiani, D.P. Bourne, Anisotropic power diagrams for polycrystal modelling: Efficient generation of curved grains via optimal transport, Computational Materials Science, Volume 245 (2024).
Lead supervisor: Dr Henry Moss
This project aims to pioneer interactive ML optimization techniques to tackle the intricate design challenges inherent in developing sustainable fusion energy solutions and accelerate progress in nuclear fusion research.
We will depart from traditional reactor design processes, and instead follow the paradigm of Bayesian sequential design, which has risen as an effective ML-driven strategy for high-cost design problems over complex search spaces through methods such as Bayesian optimization and active learning. Indeed, the potential benefits of such AI-based exploration for optimum fusion reactor components has already been demonstrated in initial work by scientists at UK the Atomic Energy Authority, where existing Bayesian optimization methodology was able to successfully design a simplified 8-dimensional parameterization of the Toroidal Field (TF) coils. However, extending this proof-of-concept work to a level where it can accelerate progress in nuclear fusion research requires novel methodology in key areas such as 1) high-dimensional and multi-fidelity optimization, leveraging different levels of computational simulations; 2) optimization of manifolds to allow direct optimization of 3D structures like TF coils; and 3) ML-guided human interaction, devising methods for effective collaboration between ML algorithms and expert scientists even when objectives are poorly defined or not fully know before beginning optimization.
Environment
accordion
Lead supervisor: Prof Bill Oxbury
AI (or machine learning) is now being explored by many researchers as a method to model flooding. However, standard AI tools learn only from data and do not incorporate known mathematical equations that govern physical systems. What is desirable is to incorporate known equations directly into the AI model structure, so that physical laws are ‘built in’ and not learned from data (which is clearly inefficient).
Such models also need to be applicable over a range of spatial scales (from, say ~1km to ~1000 km); dynamic, in the sense that they can represent temporal variability in key parameters such as river flows or flood footprints/depths; ‘explainable’ in a way that exposes climatic or landscape (e.g. soils, vegetation, urban cover) controls on those parameters; and amenable to uncertainty quantification.
This project will be part of a collaboration with expert partners in the hydrology and flood risk community, and will make use of real data for a UK river catchment, region, country or up to global scale.
A large class of models in hydrology are equivalent to small arrangements of linear or non-linear ‘storage’ ODEs. Others are based on spatially 1D, 2D or 3D PDEs. This project asks to what extent these equations can be embedded in AI model architectures. The project may also include development of data-only models and baseline comparison with ODE/PDE models; testing for physical laws (e.g. mass conservation, incompressibility) and analysis of the baseline training cost to ‘discover' these laws.
The project will explore to what extent such ‘informed AI’ approaches improve upon traditional methods in performance and/or computational efficiency.
The potential impact from this research is wide and could include:
- Production of flood risk maps (as used by government to delineate planning constraints, or by insurers to assist in underwriting;
- Flood resilience investment planning under uncertain future climate scenarios;
- A real-time model applied to forecast floods and issue warnings;
- Large-scale catastrophe ‘cat’ model as applied to assess the distribution of flood losses from individual events;
- To support interpretation or analysis of existing, competing models and proposed benchmarks.
Lead supervisor: Prof David Leslie
In classical optimization problems, we are typically given a fixed set of objective functions and constraints then tasked with finding the optimal configuration of a system. One widely used method for this is Bayesian optimization, which has gained significant popularity for optimizing the training parameters of AI algorithms. However, in many real-world scenarios, the objective is not just to find a single optimal configuration, but rather to learn an optimal mapping from environmental conditions to system configurations, i.e. identify a look-up-table of optimal configurations.
Consider a wind turbine, for example: it constantly measures environmental conditions and adjusts the configuration of its blades in response. Since these conditions are always fluctuating, it’s impractical to re-optimize from scratch each time the environment changes. Instead, we need to pre-learn a mapping that adapts the configuration based on varying conditions—a challenge known as profile optimization [1] or contextual optimization [2].
The main output from this project will be a novel ML-guided method for efficiently learning optimal look-up-tables, exploring and extending approaches from the Bayesian optimization literature and exploring links with the closely related field of reinforcement learning.
[1] Ginsbourger, David, et al. "Bayesian adaptive reconstruction of profile optima and optimizers." SIAM/ASA Journal on Uncertainty Quantification 2.1 (2014): 490-510.
[1] Char, Ian, et al. "Offline contextual Bayesian optimization." Advances in Neural Information Processing Systems 32 (2019).
Health
accordion
Lead supervisor: Prof Chris Jewell
Several important disease outbreaks over the last 20 years have been caused by diseases in wildlife spilling over into host populations in humans and livestock. Such examples include SARS, MERS in the 2000s, and recently highly pathogenic avian influenza H5N1 in domestic poultry. Whilst the disease transmission process in the host population is relatively well-understood, the spatiotemporal nature of disease prevalence in wildlife is often unclear. This project aims to develop machine-learning approaches (such as Gaussian processes and neural networks) to spatiotemporal uncertainty in wildlife disease prevalence, informed by sparse observational data on wildlife populations and also dynamical modelling of the disease process in the host population. The main output from this project will be a hybrid dynamical model capable of informing disease control policy for managing such diseases, and predicting the likely efficacy of disease interventions.
Lead supervisor: Dr Alice Peng (contact mars@lancaster.ac.uk)
Astrocytomas are tumours that develop in the central nervous system (CNS), which makes up the brain and the spinal cord [1]. Due to the critical location of the tumours, treatment necessitates taking the greatest care. Commonly surgeons carefully (partly) remove the tumour. During the first stage when the tumour is relatively small, the neurosurgeon can safely remove the tumour without damaging surrounding CNS tissue. However, at a later stage, when the tumour is large, then the tumour can only removed partially [2]. Recently, it has been evidenced experimentally that partial removement or even taking a biopsy of a tumour triggers so-called 'wound healing reactions' that lead to regrowth and densification of the tumour [3]. In some clinical cases, biopsies are known to even trigger metastasis [4], which will lead to death of the patient. In this research, we will develop an agent-based semi-stochastic mathematical model that aims at quantifying the impact of partial tumour resection on the regrowth and densification of tumours, as well as on the metastasis of tumours. Our research aims at finding optimal resection and biopsy strategies that aim at minimizing the likelihood of metastasis, tumour recovery and strengthening. In this way, future treatments and biopsies can hopefully be directed so that there is a lower risk of metastasis under the premise of minimal damage to surrounding tissue. Then functional losses for the patient will be marginal, and hence the odds of patient survival will hopefully increase. Bica et al. (2017) [5] propose a continuum-based model that takes into account the anisotropic nature of tissue. We will additionally use mathematical (homogenization) and possibly statistical techniques to invoke a bridge between the continuum and agent-based modelling scales.
[1] Hanft, Simon, and Paul C. McCormick, eds. Tumors of the spinal canal. Springer, 2021. [2] Abdulhaq, Abdulaziz Saud. Astrocytoma Classification, Presentation and Management. EC Microbiology 15 (2019): 01-08.
[3] S. Weil, et al. "Tumor microtubes convey resistance to surgical lesions and chemotherapy in gliomas." Neuro-oncology 19.10 (2017): 1316-1326.
[4] Kameyama, Hiroyasu, et al. Needle biopsy accelerates pro-metastatic changes and systemic dissemination in breast cancer: Implications for mortality by surgery delay. Cell Reports Medicine 4.12 (2023).
[5] Bica, Ion, Thomas Hillen, and Kevin J. Painter. Aggregation of biological particles under radial directional guidance. Journal of Theoretical Biology 427 (2017): 77-89.
Lead supervisor: Dr Alice Peng (contact mars@lancaster.ac.uk)
Burns and other skin traumas occur at various intensities regarding the depth and area of the skin, as well as the involvement of the different skin layers. Worldwide, an estimated 6 million patients need hospitalisation for burns annually. In most hospitalized populations with severe burn injuries, the mortality rate is between 1.4% and 18% with the maximum rate 34% [1]. Generally speaking, wound healing is a complicated process for the skin to cure itself after injuries [2], and severe wounds often have multiple layers of the skin (badly) injured. As a result, many mechanics are involved, while many of them are still not yet clearly understood. In clinical practice, often the same optical diagnosis (which is also subjective) does not lead to the same healing results. In other words, the healing process is highly patient-oriented, and mathematical modelling can indeed help improve the diagnosis and predict the healing process, with taking the patients’ (skin) characteristics into account, thanks to its flexibility of adjusting the input parameters. Over the years, most mathematical models developed for wound healing only consider either one layer of the skin or part of the healing process [3], which leads to an incomplete and less accurate prediction. In this project, we aim at developing an agent-based model which contains both the epidermis and the dermis. Therefore, the new model is able to describe both shallow wound (which stays only on epidermis) and deep wound (which damages both epidermis and dermis). Once the baseline model is the built, one can either upscale the model to a PDE system or improve the computational efficiency (e.g. scientific machine learning), which is known as the main drawback of the agent-based model.
[1] N. Brusselaers, S. Monstrey, D. Vogelaers, E. Hoste, and S. Blot. Severe burn injury in europe: a systematic review of the incidence, etiology, morbidity, and mortality. Critical Care, 14(5):R188, 2010.
[2] A. Jakovija and T. Chtanova. Skin immunity in wound healing and cancer. In: Frontiers in Immunology 14 (June 2023). issn: 1664-3224.
[3] Q. Peng and F. Vermolen. Agent-based modelling and parameter sensitivity analysis with a finite-element method for skin contraction. Biomechanics and Modeling in Mechanobiology, pages 1–27, 2020
Lead supervisor: Dr Alice Peng (contact mars@lancaster.ac.uk)
Traction force microscopy (TFM) is a technique for measuring the traction forces generated by a moving cell on a substrate or surface [1]. In its more basic setting, the cells are placed on an elastic substrate, that contains fluorescent beads. When the cell moves, it applies a force to the substrate, producing a displacement that can be measured by observing the positions of the fluorescent beads. TFM has many applications, for example, traction forces can be used as a proxy to estimate the invasive potential of a cancer cell line. The aim of this project is to extend the current data analysis methodologies from TFM data with the use of better mechanical models; vice versa, the data is expected to improve the model to describe the mechanical cell-ECM interactions. As the project focuses on single cells, agent-based modelling is favoured. Peng et al. [2] proposed a phenomenological model, in which the cell geometry is split into finite line segments by nodal points. By doing this, the model benefits from the flexibility of modelling any geometry, which is relevant to the forces generated by the cell. Instead of a smooth force field, point forces are used in this setting, which allows more accurate estimates of the force. A suitable candidate for the project will have a strong background in applied mathematics and coding skills, and an interest in the areas of inverse problems or statistical inference.
[1] A. K. Denisin, H. Kim, I. H. Riedel-Kruse, and B. L. Pruitt. Field guide to traction force microscopy. Cellular and Molecular Bioengineering, 17(2): 87–106, Apr. 2024. ISSN 1865-5033. doi: 10.1007/s12195-024-00801-6.
[2] Q. Peng, F. J. Vermolen, and D. Weihs. A formalism for modelling traction forces and cell shape evolution during cell migration in various biomedical processes. Biomechanics and Modeling in Mechanobiology, Apr. 2021. ISSN 1617-7940. doi: 10.1007/s10237-021-01456-2
Lead supervisor: Dr Alice Peng (contact mars@lancaster.ac.uk)
Cell polarisation is a complex process that involves a huge number of biochemical reactions, mainly directed by the RhoGTPases proteins [2]. In order to investigate key biological properties, [3] proposed a minimal 2-equation reaction-diffusion model and, Cusseddu et al. [1] studied a 3D bulksurface extension of such model, where cytosolic and cell membrane interactions can be explicitly considered. In mathematical terms, the model is minimal: a single bulk equation and a single surface equation. However, how biologically relevant are these assumptions? In this project, we aim at investigating biological implications of this model and possible mathematical extensions, by applying it to various migrating cell phenotypes, such as breast cancer cells and fibroblasts. With the help of the model, we expect to estimate relevant parameters and study its applicability to cell migration or cell division. Furthermore, the model has already been built in three dimensions, while the experimental results are mostly in two dimensions. Thus, the challenge lies in model validation and how to connect the simulation results and the experimental results.
[1] D. Cusseddu, L. Edelstein-Keshet, J. Mackenzie, S. Portet, and A. Madzvamuse. A coupled bulk-surface model for cell polarisation. Journal of Theoretical Biology, 481:119–135, November 2019. ISSN 0022-5193. doi: 10.1016/j.jtbi.2018.09.008.
[2] R. G. Hodge and A. J. Ridley. Regulating rho gtpases and their regulators. Nature reviews Molecular cell biology, 17(8):496–510, 2016.
[3] Y. Mori, A. Jilkine, and L. Edelstein-Keshet. Wave-pinning and cell polarity from a bistable reaction-diffusion system. Biophysical journal, 94(9):3684–3697, 2008.
Multidisciplinary application
accordion
Lead supervisor: Dr Henry Moss
Machine learning (ML)-guided design has sparked transformative change in experimental methodologies across various scientific fields, from discovering novel biological sequences to engineering innovative engineering structures and even developing superior ML algorithms. Generative AI methods, particularly diffusion models, have recently emerged as powerful tools for creating complex, highly structured outputs like images, molecules, and point clouds. Consequently, these models hold tremendous potential for accelerating resource-intensive design processes.
Despite their promise, integrating generative AI into ML-guided design loops, such as Bayesian optimization and active learning, is a significant challenge due to the inherent difficulty in updating pre-trained generative models to incoming new data. In these settings, the goal is to fine-tune the generation process iteratively, producing designs that increasingly align with target objectives and design constraints. This PhD will explore new techniques for fine-tuning conditional sampling, enabling diffusion models to be more effectively integrated into optimization pipelines.
Lead supervisor: Prof Anthony Nixon
Estimating the covariance matrix of a multivariate normal distribution is a fundamental task in statistical machine learning, in both the theoretical and practical sense. This project focuses on the case where the number of datapoints is much smaller than the number of variables. This “small data” setting arises in applications ranging from neuroscience (where the number of variables is very large) to archaeology (where the amount of data may be fixed for all time), and it requires sophisticated mathematics to make the problem well-posed.
In this project, you will study models based on applying linear constraints to the inverse covariance matrix, and how different choices of the constraints affect the statistical behaviour of the maximum likelihood estimator. The project draws together techniques from geometric and combinatorial rigidity and those from statistical machine learning, in particular dimension reduction using projections. In collaboration with practitioners, the methods of the project will be validated on real data. A background including at least two of machine learning, (algebraic) geometry, and combinatorics is very desirable.