DERI Seminar with Antoine Yang from Google DeepMind
When: Thursday, March 21, 2024, 11:00 AM - 12:00 PM
Where: Zoom
Speaker: Antoine Yang is a Research Scientist at Google DeepMind
Zoom link: https://qmul-ac-uk.zoom.us/j/81148100921
Title: Learning to describe multi-event videos from Web supervision
Abstract: This talk introduces several novel contributions to long video description. Firstly, we present Vid2Seq, a multi-modal single-stage model for dense event captioning. Vid2Seq can be effectively pretrained on unlabeled narrated videos at scale, using transcribed speech as pseudo supervision. Secondly, we release VidChapters-7M, a dataset of 7M videos segmented into chapters by online users. We study the task of video chapter generation, and show that pretraining Vid2Seq for video chapter generation on VidChapters-7M significantly improves performance on dense event captioning.
Bio: Antoine Yang is a Research Scientist at Google DeepMind, working on the multi-modal capabilities of Gemini. He completed his PhD at Inria Paris in 2023, and received a double MEng from Ecole Polytechnique and ENS Paris-Saclay in 2020.