Google’s Zero-Label Language Learning Achieves Results Competitive With Supervised Learning

submited by
Style Pass
2021-09-24 18:00:08

A Google AI research team explores zero-label learning (training with synthetic data only) in natural language processing, and introduces Unsupervised Data Generation (UDG), a training data creation procedure designed to synthesize high-quality training data without human annotations.

While contemporary deep learning models continue to achieve outstanding results across a wide range of tasks, these models are known to have huge data appetites. The emergence of large-scale pretrained language models such as Open AI’s GPT-3 has helped reduce the need for task-specific labelled data in natural language processing (NLP), as the models’ learned contextualized text representations can be fine-tuned for specific downstream tasks using relatively small training data sizes. These powerful large language models have more recently also shown their ability to generate answers for unseen NLP tasks via few-shot inference.

Motivated by this development, a new Google AI study explores zero-label learning (training with synthetic data only) in NLP, proposing Unsupervised Data Generation (UDG), a novel training data creation procedure designed to synthesize high-quality training data without any human annotations.

Leave a Comment