The internet usually has the correct answer somewhere, but it’s also full of conflicting and outdated information. How do large language models (LLM

How outdated information hides in LLM token generation probabilities and creates logical inconsistencies

submited by
Style Pass
2025-01-10 08:30:02

The internet usually has the correct answer somewhere, but it’s also full of conflicting and outdated information. How do large language models (LLMs) such as ChatGPT, trained on internet scale data, handle cases where there’s conflicting or outdated information? (Hint: it’s not always the most recent answer as of the knowledge cutoff date; think about what LLMs are trained to do)

In this article, I’m going to briefly cover some of the basics so we can think this through from first principles and then have a peek at the token generation probabilities, working our way from GPT-2 through to the most recent 4o series of models. We’ll then explore the very strange behaviour that arises when an LLM has learned both the correct and outdated information, believing both to be simultaneously true yet also contradictory. I’m going to use the height of a mountain as a running example of something that should be consistent but isn’t. Although if you don’t care as much as me about the height of some mountain you’ve never heard of (shame on you), keep in mind these principles also apply to things like the recommended dosage of medications, or if you use AI code assistants, things like which library function parameters are required/deprecated and network timeout behaviour for different platforms.

You may have seen ChatGPT use a phrase such as “as of my knowledge cutoff” (try searching that phrase on Google Scholar). However, knowledge cutoffs are not as simple as they seem, because crawls of the internet (or whatever other sources LLM creators use, OpenAI doesn’t say exactly what they put in) don’t just contain the most recent information as of that date, there’s also a lot of old or duplicated information from the past.

Leave a Comment