As with much of the world, I am trying to understand the implications of LLMs and where this is all going. I started forming hypotheses, but quickly realized I couldn’t be confident without understanding how everything actually works. So, I set out to learn the fundamentals. This post is a synthesis of that learning, and I’m sharing it in case you find it useful too.
This is a semi-technical primer. It doesn't detail the linear algebra or coding implementations, but it also doesn’t shy away from the key concepts.
You won’t walk away ready to build GPT-5 or land a $100M compensation package from Zuck, but you should come away with a clear and strong conceptual understanding of how the models behind tools like ChatGPT are trained, structured, and deployed. This will help you form more thoughtful perspectives on the implications of this technology.