A ChatGPT clone, in 3000 bytes of C, backed by GPT-2

submited by
Style Pass
2024-12-12 05:30:05

by Nicholas Carlini 2023-04-02

This program is a dependency-free implementation of GPT-2. It loads the weight matrix and BPE file out of the original TensorFlow files, tokenizes the input with a simple byte-pair encoder, implements a basic linear algebra package with matrix math operations, defines the transformer architecture, performs transformer inference, and un-tokenizes the output with the BPE decoder. All in ~3000 bytes of C.

It's optimized efficiently enough so that GPT-2 Small takes a few seconds per reply on any modern machine. To do this I've implemented KV caching and an efficient matrix multiplication algorithm, with optional OMP parallelism.

You can then use this to create something like Chat GPT---just so long as you don't care about the quality of the output. (It's actually pretty terrible output, objectively speaking... But it does run.) There are a few quirks (especially with handling UTF-8 characters), and running the XL size model at long context length can require ~100GB of RAM. But if you're just typing with ASCII using GPT2-Small it should run just about anywhere.

Leave a Comment