Skip to main content
School of Electronic Engineering and Computer Science

David Südholt




Project title: Machine Learning of Physical Models for Voice Synthesis  

Industry partner: Nemisindo 

C4DM theme affiliation: Audio Engineering, Sound Synthesis 

 Abstract: Synthesizing the sound of the human voice on a computer has been a long-standing subject of research. While its most prominent application are text-to-speech (TTS) systems, various successful singing voice synthesizers have demonstrated the value of voice synthesis within the field of digital music. Approaches to voice synthesis can be generally classified into three categories: 1. Spectral modeling techniques, 2. Physical modeling techniques and 3. Machine learning (ML) techniques. 

This project proposes to investigate the following questions:  

  • Can we combine ML methods with physical modeling, predicting control parameters for physical models of voice production such that the synthesis quality is on par with DNN-based synthesis? 
  • Can ML-based methods of parameter estimation help us gain insights into the expressive limits of a given physical model? For example, can current physical models of voice production be used successfully to generate vocalizations outside of speech and conventional singing techniques (screams, growls, whispers, etc.)? 


Back to top