A demanding new criterion for frontier AI systems has been released by the charity Center for AI Safety (CAIS) and Scale AI, a business that offers various data labeling and AI development services.
The benchmark, known as Humanity’s Last Exam, consists of hundreds of crowdsourced questions covering topics in the natural sciences, mathematics, and the humanities. The questions are presented in a variety of ways, including ones that include visuals and diagrams, to make the assessment more difficult.
ICYMT: Ghana’s producer price inflation falls to 26.1% in December 2024
No publicly accessible flagship AI system was able to achieve a score higher than 10% on Humanity’s Last Exam, according to an initial study.
In order to allow researchers to “dig deeper into the variations” and assess novel AI models, CAIS and Scale AI claim they intend to make the benchmark available to the academic community.
SOURCE: TECH CRUNCH