A Call to Build Models Like We Build Open-Source Software

submited by
Style Pass
2021-12-09 10:30:06

1  Introduction 2  A brief explanation of open-source software development 3  Community-developed and continually-improved models   3.1  Incremental and cheaply-communicable updates   3.2  Merging models   3.3  Vetting community contributions   3.4  Versioning and backward compatibility   3.5  Modularity and distribution 4  An example future 5  Conclusion

Over the past decade or so, it has become increasingly common to use transfer learning when tackling machine learning problems. In transfer learning, a model is first trained on a data-rich pre-training task before being fine-tuned through additional training on a downstream task of interest. The use of a pre-trained model (instead of starting from a model whose parameters were initialized randomly) tends to produce better performance from less labeled data on the downstream task. This synergy has made transfer learning a standard choice in many domains, and popular pre-trained models therefore see a staggering amount of use. For example, the BERT model on the Hugging Face Model Hub has been downloaded tens of millions of times, and loading a pre-trained image classification model in PyTorch is as simple as passing pretrained=True.

The widespread use of pre-trained models is a clear endorsement of their utility. However, pre-training often involves training a large model on a large amount of data. This incurs substantial computational (and therefore financial costs); for example, Lambda estimates that training the GPT-3 language model would cost around $4.6 million. As a result, most popular pre-trained models were created by small teams within large, resource-rich corporations. This means that the majority of the research community is excluded from participating in the design and creation of these valuable resources. To make matters worse, most pre-trained models are never updated — they are left as-released and reused until a better model comes along. There are many reasons why we might want to update a pre-trained model — for example, we might

Leave a Comment