Since OpenAI will not open-source the 175 billion parameter GPT-3 text generation model, others such as EleutherAI are developing their own, by traini

Fun and Dystopia With AI-Based Code Generation Using GPT-J-6B

submited by
Style Pass
2021-06-14 16:00:09

Since OpenAI will not open-source the 175 billion parameter GPT-3 text generation model, others such as EleutherAI are developing their own, by training not-quite-as-large Transformer-based models but still getting impressive results.

The latest large language model is GPT-J, a 6 billion parameter model by Aran Komatsuzaki and Ben Wang with a roughly similar architecture to GPT-3. They provide a free web demo to try quick prompts, and a Google Colab notebook if you want to test many prompts. The model is so big it requires a TPU to generate text at a reasonable speed!

Running GPT-J against my test prompts that I had used to test GPT-3 a year ago resulted it in qualitatively performing worse on most of them than GPT-3 unsurprisingly given its relative size (but still better than GPT-2 1.5B!). The exception is code generation, where GPT-J performed very well and GPT-3 had performed very poorly.

This behavior is likely due to GPT-J’s training set: it was trained on The Pile, which has a high weight of GitHub and Stack Overflow input versus the GPT-3 training set mostly on the Common Crawl representation of typical internet content.

Leave a Comment