AI isn’t very good at history, new paper finds

AI might excel at certain tasks like coding or generating a podcast. But it struggles to pass a high-level history exam, a new paper has found.

A team of researchers has created a new benchmark to test three top large language models (LLMs) — OpenAI’s GPT-4, Meta’s Llama, and Google’s Gemini — on historical questions. The benchmark, Hist-LLM, tests the correctness of answers according to the Seshat Global History Databank, a vast database of historical knowledge named after the ancient Egyptian goddess of wisdom. 

The results, which were presented last month at the high-profile AI conference NeurIPS, were disappointing, according to researchers affiliated with the Complexity Science Hub (CSH), a research institute based in Austria. The best-performing…

Source link

Leave a Comment