We strive to create an environment conducive to many different types of research across many different time scales and levels of risk.
A new active learning method for curating high-quality data that reduces training data requirements for fine-tuning LLMs by orders of magnitude.
Classifying unsafe ad content has proven an enticing problem space for leveraging large language models (LLMs). The inherent complexity involved in identifying policy-violating content demands solutions capable of deep contextual and cultural understanding, areas of relative strength for LLMs over traditional machine learning systems. But fine-tuning LLMs for such complex tasks requires high-fidelity training data that is difficult and expensive to curate at the necessary quality and scale. Standard data-intensive approaches to training models are costly, especially given the need to handle concept drift as safety policies evolve or as new types of unsafe ad content arise. In the worst case the model must be retrained on a completely new data set. Reducing the amount of training data needed is therefore paramount.
With this in mind, we describe a new, scalable curation process for active learning that can drastically reduce the amount of training data needed for fine-tuning LLMs while significantly improving model alignment with human experts. The process can be applied to datasets of hundreds of billions of examples to iteratively identify the examples for which annotation would be most valuable and then use the resulting expert labels for fine-tuning.