While artificial intelligence excels at tasks like coding and podcast generation, it struggles to accurately answer high-level history questions, according to a study.
“LLMs, while impressive, still lack the depth required for advanced history,” said Maria del Rio-Chanona, a co-author of the paper and associate professor at University College London.
For instance, GPT-4 incorrectly stated that scale armor was present in ancient Egypt during a specific time period, when in reality, the technology only appeared 1,500 years later.
Similarly, the model falsely claimed ancient Egypt had a professional standing army during a particular period, likely due to the prevalence of information on standing armies in other ancient empires, such as Persia.
OpenAI’s GPT-4 and Meta’s Llama models performed worse when answering questions about regions such as sub-Saharan Africa, indicating training data limitations.