In this series, we’ll break down the process of training models with millions of parameters — scaled just right for laptop GPUs or Colab notebooks

Expanding Knowledge in Large Language Models

submited by

Style Pass

2024-11-26 14:30:02

In this series, we’ll break down the process of training models with millions of parameters — scaled just right for laptop GPUs or Colab notebooks.

Our goal is straightforward: to understand how these LLMs behave and explore the tweaks and techniques that can shape their responses. Rather than using massive, pre-trained models, we’re focusing on building manageable models, giving you a hands-on way to learn what really goes on under the hood when you train, fine-tune, or implement retrieval-augmented generation (RAG). Let’s demystify the process together.

We are not looking for multi billion parameter models, just small ones which can produce coherent outputs. The inspiration to this was Andrej Karpathy training a 110M parameter TinyStories LLM, and so we thought, if that can produce coherent output, it can help us understand LLMs in a nuanced way that is sadly not possible with large models.

The problem sounds deceptively easy, and that is where we get to understand different model behaviors. We are not training a production grade model, just experimenting to understand what is possible and how to solve possible problems.