1 Introduction 2 A brief explanation of open-source software development 3 Community-developed and continually-improved models 3.1 Incremental and cheaply-communicable updates 3.2 Merging models 3.3 Vetting community contributions 3.4 Versioning and backward compatibility 3.5 Modularity and distribution 4 An example future 5 Conclusion
Over the past decade or so, it has become increasingly common to use transfer learning when tackling machine learning problems. In transfer learning, a model is first trained on a data-rich pre-training task before being fine-tuned through additional training on a downstream task of interest. The use of a pre-trained model (instead of starting from a model whose parameters were initialized randomly) tends to produce better performance from less labeled data on the downstream task. This synergy has made transfer learning a standard choice in many domains, and popular pre-trained models therefore see a staggering amount of use. For example, the BERT model on the Hugging Face Model Hub has been downloaded tens of millions of times, and loading a pre-trained image classification model in PyTorch is as simple as passing pretrained=True.
The widespread use of pre-trained models is a clear endorsement of their utility. However, pre-training often involves training a large model on a large amount of data. This incurs substantial computational (and therefore financial costs); for example, Lambda estimates that training the GPT-3 language model would cost around $4.6 million. As a result, most popular pre-trained models were created by small teams within large, resource-rich corporations. This means that the majority of the research community is excluded from participating in the design and creation of these valuable resources. To make matters worse, most pre-trained models are never updated — they are left as-released and reused until a better model comes along. There are many reasons why we might want to update a pre-trained model — for example, we might