In Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers, researchers investigate how large language mode

Large Language Models do Gradient Descent at runtime.

submited by
Style Pass
2023-03-17 04:30:10

In Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers, researchers investigate how large language models (LLMs) can perform tasks such as classification from examples given as the "context".

The way this works is you give the model some example sentences and the output (eg "{This is a positive sentence}, {sentiment: positive}") as the context and then give it some unclassified sentences and it will output the sentiment.

For those who believe LLMs are just "stochastic parrots" that just output what they have read this behavior should be impossible. Typically proponents of this view make objections like "oh it has seen those examples somewhere before", but these are easily disproved by making up your own examples.

In this paper the researchers probe the transformer state of the LLM before and after the context examples are loaded and show that the updated transformer state is similar to a large language model that has been fine tuned on the examples(!)

Leave a Comment