Title:  Byte Latent Transformer: Patches Scale Better Than Tokens
Authors:  Artidoro Pagnoni, Ram Pasunuru, Pedro Rodriguez, John Nguyen, Benjamin Mu

BLT: Byte Latent Transformer - by Grigory Sapunov

submited by
Style Pass
2024-12-26 00:30:05

Title: Byte Latent Transformer: Patches Scale Better Than Tokens Authors: Artidoro Pagnoni, Ram Pasunuru, Pedro Rodriguez, John Nguyen, Benjamin Muller, Margaret Li, Chunting Zhou, Lili Yu, Jason Weston, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Ari Holtzman, Srinivasan Iyer Paper: https://arxiv.org/abs/2412.09871 Code: https://github.com/facebookresearch/blt

Byte Latent Transformer (BLT) presents an interesting approach to moving away from fixed-vocabulary tokenization and working at the byte level in LLMs. It dynamically splits the input stream into patches, determining their boundaries based on next-symbol entropy, and operates on these patches. If the data stream is simple and predictable, we can make patches longer, and if things get complex, we can allocate more compute to a larger number of patches. This gives us dynamic compute allocation.

Tokenizers are an interesting story that exists alongside the beautiful and differentiable end-to-end training of transformers, disrupting this idyllic end-to-end picture. Tokenizers are also trained, but not through backpropagation - instead, they use a relatively simple algorithm that selects the most frequently occurring sequences in the language and builds a vocabulary of predetermined size from them.

Leave a Comment