If you’ve been following the NLP field, you’ve probably noticed that the release of Transformer Architecture caused a paradigm shift in how large

Pattern Exploiting Training: You Don’t Need GPT-3's Size To Be Good at Few-Shot Learning

submited by
Style Pass
2023-03-15 08:00:07

If you’ve been following the NLP field, you’ve probably noticed that the release of Transformer Architecture caused a paradigm shift in how large language models are built. Since then, the progress in this area has been moving so blazing fast that it’s very challenging to keep up. People come up with more and more fun variations of the BERT name for their models (but CamemBERT is not easy to beat), the number of parameters has long been measured in billions, and generative models are getting so good they gain 100 million users in the first two months. The list goes on and on.

As of this writing, ChatGPT was released a couple of months ago, Bing Chat was opened for testing a few weeks ago, and Google plans to release BARD soon. The race of big language models is gaining momentum, and it looks like it’s not going to stop anytime soon.

More and more people are getting introduced to NLP, and in many cases, it’s happening through a service that’s based on a giant language model. You may have seen one of the many demos showing impressive capabilities in zero- and few-shot learning. Now, you may be thinking about solving your problem with a similar model and a small amount of data.

Leave a Comment