EECS researchers win prestigious AI Benchmark competition

Published:

5 February 2025

The contest was part of the "Humanity’s Last Exam" (HLE) Project, an initiative designed to push artificial intelligence to its limits by challenging it with expert-level questions.

AI systems are typically evaluated based on benchmark questions that assess their intelligence and performance. However, as AI models have rapidly advanced, existing benchmarks have become too easy, failing to differentiate between high-performing systems. The HLE project aimed to change this by curating a new benchmark set of exceptionally difficult questions—so challenging that any AI capable of solving them would possess research-expert-level knowledge across multiple academic disciplines.

The competition attracted nearly 1,000 international researchers and experts, who submitted questions spanning over 100 subjects. The rigorous selection process involved three stages:

AI Evaluation: five of the best AI models attempted each question. If all failed, the question advanced.
Expert Review: experts refined and assessed the questions and answers.
Final Selection: a panel of experts and organisers made the final call.

Out of 70,000 submitted questions, only 3,000 made it into the final benchmark, with the top 50 declared as winners and each earning a prize. Dr. Marc Roth and Dr. Søren Riis, the sole participants from Queen Mary University of London, were among the winners, with one of Roth's questions featured in the paper accompanying the publication of the HLE benchmark set.

Their success highlights the growing need for advanced AI evaluation and the crucial role of human expertise in shaping the future of artificial intelligence.

You can read the preprint of the paper here.

Global main menu

Study at Queen Mary

Experience Queen Mary

Subjects

Research and Innovation

Research by faculties and centres

Collaborations and partnerships

Subjects

Study at Queen Mary

Experience Queen Mary

Research and Innovation

Research by faculties and centres

Collaborations and partnerships

EECS researchers win prestigious AI Benchmark competition

Study at Queen Mary

Experience Queen Mary

Breadcrumb

EECS researchers win prestigious AI Benchmark competition