Dr. Marc Roth and Dr. Søren Riis from the School of Electronic Engineering and Computer Science have emerged as winners in a groundbreaking AI competition organised by Scale AI and the Center for AI Safety.
The contest was part of the "Humanity’s Last Exam" (HLE) Project, an initiative designed to push artificial intelligence to its limits by challenging it with expert-level questions.
AI systems are typically evaluated based on benchmark questions that assess their intelligence and performance. However, as AI models have rapidly advanced, existing benchmarks have become too easy, failing to differentiate between high-performing systems. The HLE project aimed to change this by curating a new benchmark set of exceptionally difficult questions—so challenging that any AI capable of solving them would possess research-expert-level knowledge across multiple academic disciplines.
The competition attracted nearly 1,000 international researchers and experts, who submitted questions spanning over 100 subjects. The rigorous selection process involved three stages:
Out of 70,000 submitted questions, only 3,000 made it into the final benchmark, with the top 50 declared as winners and each earning a prize. Dr. Marc Roth and Dr. Søren Riis, the sole participants from Queen Mary University of London, were among the winners, with one of Roth's questions featured in the paper accompanying the publication of the HLE benchmark set.
Their success highlights the growing need for advanced AI evaluation and the crucial role of human expertise in shaping the future of artificial intelligence.
You can read the preprint of the paper here.