Project opportunities for September 2023 start will appear here soon.
Below is a list of previous projects that will be covered by our PhD students who start in September 2022. Each project will be led by one student, with academic and industry supervision. Click on each project title to see a summary.
E1-1: Next generation Text Mining in Drug Discovery
Extracting interesting and non-trivial patterns from text documents is the next-generation wave of knowledge discovery in biochemical sciences. Free text resident in biomedical literature contains a wealth of information about small molecules and their targets that is not currently stored in biochemical knowledgebases. This information can be exploited to identify and build specific signatures for drug-gene associations, chemical and biological toxicity and even adverse drug effects.
Recent advances in embedding methods have shown promising results for several biomedical and clinical tasks. Text classification performed on biomedical records poses specific challenges including dataset imbalance, miss-spellings, abbreviations or semantic ambiguity. Current state-of-the-art approaches apply deep learning to the task, mainly convolutional neural network (CNN), recurrent neural network (RNN), bi-directional long short term memory (Bi-LSTM), and BERT (Devlin et al.,2019; Wolf et al.,20).
In this project you will contribute towards Exscientia’ existing text mining platform by optimising named entity recognition (NER) procedures and applying novel machine learning strategies to generate your own semantic lexicon. You will have access to expertise across Discovery and AI technology teams to advise/support you during your stay at Exscientia. You will have the chance to evaluate your success in one of our in-house text-mining competitions.
Supervisors: Dan Crowther Head of Target Analysis at ExscientiaMassimo Poesio Professor of Computational Linguistics, QMUL
E1-2: Multi-omics systems biology modelling of patient networks in age-related diseases
Bringing the right drug to the right patient at the right time is one of the biggest global challenges in pharmaceutical drug research. To truly understand the mechanics of disease and its target opportunities, detailed knowledge capture and representation is essential. Previous efforts to capture drug gene disease interactions have been realised in large scale omics initiatives such as the LINCS Connectivity MAP project and genome scale knowledge graphs. However, these efforts produce high quality reference biochemical networks that do not capture patient specific modalities. The genetic profile of an individual is a key determinant of disease and patient heterogeneity that is missing from these network structures.
In this project you develop expertise in modelling multi-omics datasets at multiple scales. Your focus will be to introduce genetic and multi-omics repository data from sources such as UK Biobank or Genomics England into our existing systems representing key areas of age-related disease that are of interest to Exscientia. Using state of the art systems biology and AI-based methods, you will create models of disease-relevant processes. You will evaluate their utility in stratifying patient groups, and in identifying the most appropriate topological sites for targeted therapeutic intervention. You will work closely with the Target Analysis and Discovery teams at Exscientia and benefit significantly from their training and expertise in both AI and discovery platform technologies. Within QMUL, you will work within Professor Damian Smedley’s team who have expertise in investigating the genetic cause of disease in the 100,000 Genomes Project . As part of the Monarch Initiative (monarchinitiative.org) they use ontologies to model gene-phenotype associations in individual patients, reference human diseases, as well as model organisms to shed light on genes with no prior human data (IMPC; mousephenotype.org), and you will leverage this to improve knowledge representation in the networks.
Supervisors:Anna Lobley Senior Biological Data Scientist at ExscientiaDamian Smedley Professor in Computational Genomics,WHRI
E1-3: AI detection of druggable features from high content imaging data
The value of imaging data to drug discovery and disease research is now being realised. A wave of several large repositories storing high and low density bio-image data are now publicly available for use. Two such big-data resources that are of great interest to the biopharmaceutical industry are the Cancer Imaging Consortium, Broad Institute JUMP Consortium cellular perturbations archive which contains more than 1 billion cells with 140,000 perturbations and the DeepCell normal tissues imaging atlas. These resources can be data-mined and modelled to predict the effect of chemical and genetic perturbations on cells.
In morphological profiling, quantitative data are extracted from microscopy images of cells to identify biologically relevant similarities and differences. Rich feature data (including measures of size, shape, texture and intensity) produce profiles suitable for the detection of subtle phenotypes and cell lineage information.
In this project you will be responsible for extracting valuable biomarker features from morphological images in normal and perturbed cellular states. You will investigate the correlation of these features with other ‘omics modalities and phenotypes. You will develop expertise in the field of deep learning to interpret the effects of perturbations on cells.
Supervisors:Anna Lobley Senior Biological Data Scientist at ExscientiaGreg Slabaugh Director of the Digital Environment Research Institute (DERI) & Prof of Computer Vision and AI
M1-1: Prioritising drug targets in non-canonical pathways from multi-omic data using AI-based approaches
There exists a large and rapidly growing body of publicly available datasets from studies that monitor cellular response to perturbation using methods such as bulk and single cell transcriptomics, LS-MS/MS proteomics and phosphoproteomics, metabolomics and epigenetics. While individual studies are useful in their own right, large scale meta-analysis of data across these studies promises more significant breakthroughs, such as the discovery of valuable new drug target leads.
The aim of this project is to develop and apply AI methodologies to identify potential drug targets from large collections of aggregated omics data. Specifically, you will explore the use of supervised machine learning approaches to learn patterns of behaviour exhibited by proven drug targets and use the results of this work to build predictors capable of ranking other molecules according to their drug target potential. Ultimately, the aim will be to apply your newly developed AI approach to in-house data held by MSD.
The project offers extensive freedom in terms of the AI methods used and the data sets studied, so provides an excellent opportunity to gain hands-on experience of the latest machine learning algorithms, and to evaluate the value of different types of omics data.
We are seeking a highly motivated student who is passionate about contributing to biological knowledge through the application of AI to large biomolecular data sets. The ideal candidate will have a grounding in both molecular biology and data science – this could be through a Masters degree in a subject such as bioinformatics, or alternatively you may have a first class degree in computer science followed by bioscience experience, or vice versa. You will be confident in coding in Python, with experience of data wrangling, statistics and machine learning.
Supervisors: Dr Victor Neduva Senior Principal Scientist in Genomics and Biomarkers Group at MSDProf Conrad Bessant Professor of Bioinformatics, QMUL
M1-2: Using AI to investigate and modulate gene function in patient populations and biobanks
Drug target identification and validation is a critical foundation in drug discovery that has been shown to be extensively informed by genetic data. This MSD collaborative PhD will focus on using AI for the interpretation of genetic changes that disrupt the function of protein-coding genes. By linking variants to phenotype it is possible to identify both benign and damaging impacts from human knockouts with immediate implications for drug discovery, including the identification of safer targets or drug repositioning opportunities, or the early identification of potential risks for drug discovery. The project will feed from a number of global collaborative projects linking genetic variation to human phenotypes and disease risk, with a particular focus on genotyping and exome data from Genes and Health, a large-scale study in consanguineous South Asian individuals living in the UK. Genes and Health is a unique research programme working with the South Asian communities in East London, Bradford and Manchester, which have some of the highest rates of heart disease, diabetes, and poor health in the UK. Uniquely the study involves the whole community, linking across primary and secondary care. This presents an opportunity to explore a wide range of phenotypes, including broad measures of health and healthy aging, ultimately leading to new ways of improving health for communities in the UK and worldwide.
Applicants should demonstrate a passion for using AI with health data to deliver new therapies to patients. Applicants should hold a Masters or undergraduate degree with equivalent experience in AI and machine learning, bioscience, mathematics, physics, computer science or a related field. Good understanding of the application of statistics in AI and proficiency in at least one scientific programming language are essential. Familiarity with major machine learning frameworks such as scikit-learn or TensorFlow for building machine learning models and architectures.
Supervisors: Dr Wei Wei Senior Scientist at MSDDr Victor Neduva Senior Principal Scientist in Genomics and Biomarkers Group at MSDProf Mike Barnes Professor of Bioinformatics and Director of the Centre for Translational Bioinformatics, WHRI
H1-1: Structural bioinformatics and chemogenomics approaches to navigate GPCR-ligand interaction space
This project will focus on the development and application of artificial intelligence (AI) augmented structural bioinformatics and chemogenomics methods to guide structure-based drug discovery (SBDD) for G Protein Coupled Receptors (GPCRs), the largest family of cell signaling transmembrane proteins. The tremendous progress in GPCRs structural biology has provided new opportunities for structural chemogenomics approaches to identify relationships between the chemical and structural properties of GPCR ligands and their receptor binding sites. This project will explore novel ways to integrate GPCR structural, sequence, mutation, and ligand chemistry and pharmacology data and descriptors into AI-augmented structural chemogenomics/bioinformatics models and workflows to navigate structural GPCR-ligand interaction space for computer-aided drug design (CADD).
You will use a variety of computational chemistry and structural bioinformatics techniques for amongst others GPCR-ligand binding mode prediction, virtual GPCR ligand screening, and GPCR binding site comparison applications, including protein binding site druggability analysis, structural protein-ligand interaction fingerprint analysis (using empirical and physics-based descriptors), protein homology modeling and molecular docking, enhanced molecular dynamics (MD) simulation approaches. You will work on the development, evaluation, and optimization of novel approaches augmenting with experimental GPCR structural biology, chemical, and pharmacological data and state-of-the-art AI and ML techniques, including deep generative models, convolutional neural network models, and reinforcement learning.
We are seeking candidates with keen interest in applying computational tools to address complex chemical and biological questions. Applicants should hold a Master's degree in Chemistry or a related discipline, and should be familiar with at least one scientific programming language. Experience in computational chemistry, cheminformatics, and/or computer-aided drug discovery is desirable. The position requires excellent inter-personal, oral and written communication skills in a collaborative, interdisciplinary biotech drug discovery environment.
Note: A computer science diploma without knowledge of chemistry will be insufficient to efficiently perform this PhD project.
Supervisors:Francesca Deflorian – Principal Scientist at Sosei HeptaresChris de Graaf – Director, Head of Computational Chemistry at Sosei HeptaresPeter McCormick - Reader in Pharmacology, The William Harvey Research Institute, QMUL
H1-2: Cheminformatics and Machine Learning approaches for GPCR Computer-Aided Drug Design
This project will focus on the development and application of cheminformatics, machine learning (ML), and Artificial Intelligence (AI) approaches for Computer-Aided Drug Design (CADD) for G Protein Coupled Receptors (GPCRs), the largest family of cell signalling transmembrane proteins that can be modulated by a plethora of chemical compounds. This project will use the information available in many heterogeneous types of protein-ligand interaction data to develop models that enable the design of efficacious therapeutic compounds targeting GPCRs. The project will make extensive use of public bioactivity data drawn both from the literature and patents, as well as experimentally determined structures of GPCR-ligand complexes.
You will use a variety of computational chemistry and cheminformatics techniques such as similarity assessment using 2D and 3D approaches, de novo design, quantitative structure-activity relationships for property prediction, molecular interaction fields, and protein-ligand docking. You will work on the development, evaluation, and optimization of novel approaches augmenting with experimental GPCR structural biology, chemical, and pharmacological data and state-of-the-art AI and ML techniques, including deep generative models, convolutional neural network models, and reinforcement learning. The ultimate goal will be techniques and approaches that can be applied to GPCR drug discovery projects as part of the Design-Make-Test-Analyse cycle.
Supervisors:Noel O’Boyle – Principal Scientist at Sosei HeptaresChris de Graaf – Director, Head of Computational Chemistry at Sosei HeptaresArianna Fornilli - Senior Lecturer in Computational Organic Chemistry, School of Physical and Chemical Sciences, QMUL