Deep Neural Networks (DNN) require a lot of training data.  Even fine-tuning a model can require a lot.  A LOT.  So how can you know if you have used

How to tell if you have trained your Model with enough data ?

submited by
Style Pass
2021-07-10 03:00:08

Deep Neural Networks (DNN) require a lot of training data. Even fine-tuning a model can require a lot. A LOT. So how can you know if you have used enough? For Computer Vision (CV) models, you can always look at the test error. But what about fine-tuning large, transformer models like BERT or GPT ?

WeightWatcher is an open-source, diagnostic tool for evaluating the performance of (pre)-trained and fine-tuned Deep Neural Networks. It is based on state-of-the-art research into Why Deep Learning Works. Recently, it has been featured in Nature:

In the paper, we consider the example of GPT vs GPT2. GTP is a NLP Transformer model, developed by OpenAI, to generate fake text. When it was first developed, OpenAI released the GPT model, which had specifically been trained with a small data set, making it unusable to generate fake text. Later, they realized fake text is good business, and they released GPT2, which is just like GPT. but trained with enough data to make it useful.

We can apply WeightWatcher to GPT and GPT2 and compare the results; we will see that the WeightWatcher log spectral norm and alpha (power law) metrics can immediately tell us that something is wrong with the GPT model. This is shown in Figure 6 of the paper;

Leave a Comment