The Transformer architecture has been a major component in the success of Large Language Models (LLMs). It has been used for nearly all LLMs that are

Exploring Language Models

submited by
Style Pass
2024-03-30 18:30:03

The Transformer architecture has been a major component in the success of Large Language Models (LLMs). It has been used for nearly all LLMs that are being used today, from open-source models like Mistral to closed-source models like ChatGPT.

To further improve LLMs, new architectures are developed that might even outperform the Transformer architecture. One of these methods is Mamba, a State Space Model.

Mamba was proposed in the paper Mamba: Linear-Time Sequence Modeling with Selective State Spaces. 1 You can find its official implementation and model checkpoints in its repository.

Thanks for reading Exploring Language Models! Subscribe for free to receive new posts on the Intersection of AI and Psychology and the upcoming book: Hands-On Large Language Models

In this post, I will introduce the field of State Space Models in the context of language modeling and explore concepts one by one to develop an intuition about the field. Then, we will cover how Mamba might challenge the Transformers architecture.

Leave a Comment