Scaling up self-attention inference

submited by
Style Pass
2024-09-28 09:30:02

Current transformer models do not keep improving by interacting with users or the world; they cannot retrieve past experiences or learn from them directly. One can introduce a separate memory module to keep track of past experiences, similar to what OpenAI is doing, but these textual memories cannot capture the full details of past experiences.

We aim for a model that can retrieve all of its past experiences when needed. This model can keep improving itself, learning from its mistakes, much like a human does. It can perform reasoning and planning to solve problems that require thousands of steps, over many days and months. I believe long context window inference is one of the cornerstones of AGI.

With a long context window, we can elegantly solve the RAG (Retrieval-Augmented Generation) problem, eliminating the need for a separate pipeline to retrieve relevant content for the model as we can encode all relevant information in a single input sequence.

Google DeepMind is heavily investing in this direction. Their Gemini 1.5 models support a 2M token context window. Also, magic.dev, a startup company, recently announced their work on 100M Token Context Windows.

Leave a Comment