School of Biological and Chemical Sciences

Data science & machine learning for genomic analysis

Supervisor: Dr Yannick Wurm

Project description

Interested in supercharging the productivity of genome biologist researchers?

The first generation of molecular-genetic research focused on traditional model organisms including mouse, yeast, zebrafish, Drosophila, and C. elegans. Genetic research increasingly uses diverse organisms that are much more relevant models for specific questions. For example, some such emerging organisms exhibit unique phenotypes including 100-fold intra-specific variation in lifespan, resistance to harsh environmental conditions, represent novel animal models for disease or development, provide crucial ecosystem services, are vectors of understudied disease, or are key to food security because they are crops or pollinators, or crop pests.

Multiple challenges exist when working with such “emerging” model organisms - in particular because their genomes are of lower quality than those that have received detailed attention by tens of thousands of researchers.

Here, we will develop a bioinformatics tool that facilitates analysis and visualisation of genomic (or other -omic) data from previously understudied species. This tool will be designed pragmatically in a manner that takes into account researchers needs to effectively answer biological questions. For this, we will incorporate best-practices in software engineering and data science, recent technological innovations in genomic analysis, and build on existing work including previously existing analysis libraries, advances statistical and machine-learning techniques. The tool aims to extract significant value from largescale datasets that would otherwise require laborious case-by-case engineering efforts to connect. Summary data will be returned to the user using visualisations, statistics and tables in a manner that facilitates interpretation.

We will package our work in a manner that makes it accessible to biologists working with published or unpublished genomic data; we will build on our extensive success with including with the SequenceServer software (http://sequenceserver.com). Overall, our approach will substantially improve the ability of genome biologists to generate meaningful biological insight when working with new organisms.

Eligibility and applying

Applicants can refer to the minimum entry requirements and English language requirements for our PhD programmes on our entry requirements page.

For more specific advice on experience/skills required for the project, and for any other enquiries about the project, please contact Dr Yannick Wurm (y.wurm@qmul.ac.uk). 

Before submitting a formal online application, it is recommended that you contact Dr Wurm by email to express your interest in the project, also including your CV and information on how you intend to fund your studies.

References

See also