Skip to main content
School of Electronic Engineering and Computer Science

Miss Ilaria Manco


Room Number: Engineering, Eng 403


Project Title: 

Deep learning and multi-modal models for the music industry


This PhD will investigate the use and development of advanced multi-modal machine learning models with music industry applications, leveraging the large amount of data available in the modern digital music ecosystem. The adoption of data-driven and machine learning based methods, often using deep learning, is already significant in many areas of the music industry such as music identification, music discovery, personalisation of fan experience, catalogue management etc. However, some challenges remain open. For example, many approaches are built upon limited and sometimes simplified representations of music. For instance, audio genre classification of global music collections is typically done using a single flat taxonomy, thereby disregarding hierarchy and local territories discrepancies. Moreover, typical machine learning models that have music industry applications tend to rely on a single type of data. For example, recommendations engines rely on music consumption data, and automatic music tagging systems rely solely on audio. There is now evidence to suggest that multi-modal models are a promising avenue for further development. In particular, by connecting data sources of different nature, we expect multi-modal models to have potential to help better understand, extract and analyse the structure and trends present in large, often unstructured, datasets. The PhD research will investigate: 1) How can multi-modal models help learn more complete, more relevant and more effective representations? 2) How can such models help extract more and/or better knowledge from large amounts of data? A proposed approach is the investigation of multi-modal machine learning models with a particular interest for approaches in which audio is one of the modalities, and deep learning. Examples of potential applications are: using multi-modal models to learn/discover latent structure in unstructured data (e.g. hierarchical genre/sub-genre classifier) or detecting leading indicators of trend (e.g. identify emergence of a new "sound", genre or influencers).

C4DM theme affiliation:

Music Informatics, Machine Listening


Machine Learning (Postgraduate)

The aim of the module is to give students an understanding of machine learning methods, including pattern recognition, clustering and neural networks, and to allow them to apply such methods in a range of areas.


Research Interests:

  • Multimodal Deep Learning
  • Music Information Retrieval
Back to top