In this post, we describe our recent scaling of training data attribution (TDA) methods to LLM pretraining, and the curious phenomena this uncovered: the examples that influence a model’s knowledge of a fact are often not the ones that directly express or imply it.
Large language models (LLMs) are trained on billions to trillions of words, but it is largely unknown how the models leverage their training data to make predictions. One promising branch of methods to understanding this process is called training data attribution (TDA). These methods aim to identify influential training examples for specific model outputs. However, a major limitation to advancing TDA research for LLMs has been the sheer scale of LLM pretraining, which is the first, longest, and arguably most important stage of LLM training.
In our paper, “Scalable Influence and Fact Tracing for Large Language Model Pretraining”, we demonstrate how training data attribution methods can scale to LLM pretraining. Relative to previous work on large models (e.g here, here, and here), we were able to retrieve influential examples from over 30x more pretraining examples and for over 100x more queries. Here, we introduce our advances in scalable TDA methodology, which we call TrackStar.