<
https://theconversation.com/ai-is-failing-humanitys-last-exam-so-what-does-that-mean-for-machine-intelligence-274620>
"How do you translate ancient Palmyrene script from a Roman tombstone? How many
paired tendons are supported by a specific sesamoid bone in a hummingbird? Can
you identify closed syllables in Biblical Hebrew based on the latest
scholarship on Tiberian pronunciation traditions?
These are some of the questions in “Humanity’s Last Exam”, a new benchmark
introduced in a study published this week in
Nature. The collection of 2,500
questions is specifically designed to probe the outer limits of what today’s
artificial intelligence (AI) systems cannot do.
The benchmark represents a global collaboration of nearly 1,000 international
experts across a range of academic fields. These academics and researchers
contributed questions at the frontier of human knowledge. The problems required
graduate-level expertise in mathematics, physics, chemistry, biology, computer
science and the humanities. Importantly, every question was tested against
leading AI models before inclusion. If an AI could answer it correctly at the
time the test was designed, the question was rejected.
This process explains why the initial results looked so different from other
benchmarks. While AI chatbots score above 90% on popular tests, when
Humanity’s Last Exam was first released in early 2025, leading models
struggled badly. GPT-4o managed just 2.7% accuracy. Claude 3.5 Sonnet scored
4.1%. Even OpenAI’s most powerful model, o1, achieved only 8%.
The low scores were the point. The benchmark was constructed to measure what
remained beyond AI’s grasp. And while some commentators have suggested that
benchmarks like
Humanity’s Last Exam chart a path toward artificial general
intelligence, or even superintelligence – that is, AI systems capable of
performing any task at human or superhuman levels – we believe this is wrong
for three reasons."
Cheers,
*** Xanni ***
--
mailto:xanni@xanadu.net Andrew Pam
http://xanadu.com.au/ Chief Scientist, Xanadu
https://glasswings.com.au/ Partner, Glass Wings
https://sericyb.com.au/ Manager, Serious Cybernetics