Skip to main content
Data-Centric Engineering

Physics & Astronomy

Below you will find Data-Centric Engineering projects offered by supervisors within the School of Physics & Astronomy

This is not an exhaustive list. If you have your own research idea, or if you are a prospective PDS candidate, please return to the main DCE Research page for further guidance, or contact us at  

Data-Centric Molecular Dynamics Simulation of Supercritical Fluids for Process Engineering

We are used to basic three states of matter: solid, liquid, gas. Liquids and gases are separated by a boiling line ending at the critical point where, according to current understanding, all differences between gases and liquids end. This state of matter is thought to be hot and homogeneous. Properties of supercritical matter are largely unknown, despite the increasing use of supercritical fluids in industrial processes and environmental applications.

We have recently proposed a new line, the Frenkel line, to exist in the supercritical state and separate two distinct physical states. Importantly, system properties can be expressed in terms fundamental physical constants at the line ( This possibility received wide press coverage including in the Physics World, the Economist and has earned our work the 2020 Physics Breakthrough prize.

In this project, we will use massive parallel molecular dynamics simulations and new data analysis methods to conduct the first very wide survey of the supercritical state of matter and its key properties. This extensive data will reveal the key properties of the Frenkel line and provide a new structure to the phase diagram of matter.

Experimentally, the Frenkel line corresponds to maximal extracting and dissolving abilities of supercritical fluids used to process and extract chemicals in pharmaceutical, food (e.g. making decaf coffee using supercritical CO2 extraction) and chemical industries, as well as breaking down harmful substances and wastes using supercritical water. Therefore, a fundamental understanding of the Frenkel line on the phase diagram will enhance existing processes and help develop new more efficient routes in industrial and environmental applications.

Supervisors: Prof Kostya Trachenko & Dr Anthony Phillips

Development of novel detector response inversion techniques to search for new physics at the LHC

The ATLAS detector at the Large Hadron Collider is a sophisticated experimental apparatus collecting proton-proton collision data at the highest energies achievable. The experiment aims to find new physics that could explain some of the biggest questions in modern physics: Why is our universe made of matter and not anti-matter? What is the nature of Dark Matter?

The data is collected using multiple subdetectors recording and selecting collisions in over 100,000,000 electronic channels. The response of the detector is accurately simulated using data-intensive techniques and then used to correct the data for biases and miscalibrations. This detector response inversion allows us to measure the true underlying physics observables.

In this project we will apply novel data-centric techniques to the inversion problem which will include machine learning classification and applications of Gaussian Processes. The methods will be developed to propagate all measurement uncertainties and will be applied in the search for Lepton Flavour Violation.

Keywords: Inverse problems, machine learning, unfolding, particle physics, LHC, CERN, lepton flavour violation

Supervisor: Dr Eram Rizvi

GPU-accelerated high-dimensional inference for next-generation radio interferometers

Large arrays of radio telescopes like the Square Kilometre Array will be able to detect and image the neutral hydrogen gas around the first stars and galaxies for the very first time. In order to do this, they must be calibrated very precisely however, to better than one part in a million. This is an enormous challenge given the complexity of the telescopes and the massive volume of data they generate. In this project, we will develop and tune a distributed, extremely high-dimensional statistical inference pipeline with GPU acceleration that is capable of such high-precision calibration of millions of parameters while also retaining important statistical information about the data in almost real-time, despite being unable to store the raw data itself due to its massive volume. This will be trialled on data from the HERA and MeerKAT telescopes, which are precursors to the SKA.

Keywords: Bayesian inference, astronomy, Big Data, GPUs, distributed computing

Supervisors: Dr Phil Bull

Novel Machine Learning approaches for problems with parametrized signals at the LHC

In many situations in High-Energy Physics we are confronted with the challenge to optimally discriminate a spectrum of hypothetical signals against a known background spectrum. These signals are typically dependent on a model parameter, e.g. the partice mass, which modifies the distributions of the measured features. Traditional approaches struggle with the trade-off between high performance and high interpolation power. Recent novel approaches as for example Parametrized Neural Networks or Regression techniques promise to mitigate this predicament. The student would investigate the development of novel approaches in the context of the search for heavy gauge and Higgs bosons in the ATLAS experiment, using LHC data. 

Supervisor: Dr Ulla Blumenschein

Machine Learning approaches to structure analysis of magic size quantum dots

Magic size clusters (MSCs) are small (around 1nm) inorganic nanoparticles that can be synthesised with atomic precision. MSCs exhibit several unique properties (e.g.  reversible isomerisation) not found in any other inorganic systems and provide great promise in atomic-scale control of quantum dots with electronic and optical properties engineered precisely for applications, delivering tuneable single photon sources of ultra-narrow bandwidth. However, the knowledge of their exact atomic and electronic structure, while essential for gaining insights into their electronic and optical properties, is yet to be established in many cases.

The goal of this project is to utilise machine learning techniques to construct a database of computer-generated cluster structures with the corresponding x-ray absorption signals. We will then utilise the database and the corresponding key relevant structural descriptors to recover the accurate atomic structure of MSCs from the experimental data using recently implemented machine learning algorithms (e.g. Keras and HEP-software solution).

Keywords: nanomaterials, quantum dots, energy materials, x-ray absorption, machine learning, artificial neural networks

Supervisor: Dr Andrei Sapelkin

Developing explainable artificial intelligence for scientific discovery

Real world problems and scientific exploration using machine learning methods often requires an understanding of why predictions are made.  This can be challenging with many modern machine learning algorithms that may be highly abstract or extremely complicated, including, for example support vector machines and modern deep learning algorithms. 

An understanding of how an input data set translates into the predicted outcomes can highlight room for improvement in models, biases in the data set, under- and over-training, confounder examples and other pathologies that may only become apparent with a model long after it has been learned and it is being applied to data.  These are all key issues related to developing robust usable models, which in turn lead into robust predictions for the real world or applied to scientific discovery.

The need for explainable AI methods transcends many uses of machine learning. This project will develop novel explainable AI methods and apply those to scientific data. This is a cross disciplinary doctoral project between the Schools of Electronic Engineering and Computer Science and the Physics and Astronomy.  The scientific data used for this thesis will come from the CERN Large Hadron Collider where tiny signals for new types of fundamental particle are being sought in large data sets with significant backgrounds.

Explainable AI methods will be used to play a significant role in aiding work toward scientific discovery, and through developing new computer science methods to changing the way that particle physics approaches this problem.

Supervisor: Prof Adrian Bevan

Using machine learning to identify novel molecular materials with chosen properties

Electrically active materials – semiconductors, dielectrics, ferroelectrics, piezoelectrics, and pyroelectrics – are central both to our modern digital world and to future technologies that will be needed for sustainable development. Materials that encapsulate molecular ions, such as the “molecular perovskites”, offer tremendous promise, in some cases exceeding the performance of industry-standard ceramics; but the multidimensional landscape of potential materials in this family is so vast that ad hoc synthetic or computational exploration is hopelessly inefficient.

In this project, we will instead explore this landscape using machine-learning techniques. We aim to identify entirely new materials in the perovskite family with electrical properties that could equal or exceed those of inorganic perovskites. 

Supervisor: Dr Anthony Phillips

Improved model selection using Bayes factors

The Bayes factors of fits of models to data is the gold-standard criterion for selecting underlying models that describe a set of data. However, it is computationally very intensive and so is scarcely used. For 30 years a formula has been known that provides easy computation of the Bayes Factors for least-squares fits. The formula is not well-known, nor is it routinely exploited to guide model selection and improve parameter estimation. Instead, it has very often been presented merely as part of the derivation of the Bayesian Information Criterion (BIC), an increasingly popular approximation to the Bayes Factor.

In this project, we will demonstrate its value in collaboration with many colleagues who have experimental data-sets requiring stringent model selection (for example, spectra with multiple overlapping peaks). We will exploit this in maximum likelihood fitting in a fully data-centric approach. We will seek ways to apply it in fields such as epidemiology where models are not analytic but defined, for example, by a set of differential equations. The outcome of the project, we hope, will be the routine use of Bayes Factors in all fields where models are fitted to data.   

Supervisor: Prof David Dunstan

Ultra-thin silicon detectors

The Scholar will design and test prototype ultra-thin curved silicon modules, with potential applications in low mass detectors for X-ray diffraction, sensors for imaging, and nuclear security. Building on the existing Zero support Mass Detector (ZMD) programme, which currently has a PhD student and a postdoctoral researcher developing preliminary designs, the Scholar will advance the technology by working through the practical steps of prototyping and studying the performance of this novel technology. 

The material stresses in the material, as well as its thermal properties, are areas that will require both detailed data-centric analysis and data-intensive computational simulation.  The Scholar will also characterise the performance of devices, in order to evaluate the suitability of these for applied science usage and for nuclear security application.

Supervisor: Dr Seth Zenz

Development of organic semiconductor radiation detectors

Development of organic semiconductor radiation detectors using a triumvirate approach of simulation-fabrication-testing as a feedback loop of predicting device performance, fabricating the device and testing; with the test data being analysed in order to compare back to predictions to understand the underlying fundamental physics behaviour to feedback into simulation.

Supervisor: Prof Adrian Bevan