Generative Adversarial Networks (GAN) have gained a lot of attention recently, mostly with the mesmerizing results of StyleGAN2. How do they work? How

TextBoxGAN: Generate millions of text boxes

submited by

Style Pass

2021-07-09 12:00:06

Generative Adversarial Networks (GAN) have gained a lot of attention recently, mostly with the mesmerizing results of StyleGAN2. How do they work? How can we use them to generate readable text boxes from input words? Why is it useful? We are going to answer all these questions.

Labeling data to train Optical Character Recognition (OCR) network, commonly used to read text from natural images, is expensive. Indeed, it involves cropping the text instances within your image to get text boxes, and manually writing the text. If you consider upper and lower cases and special characters, the number of classes becomes quite high. Hence, a large amount of data is needed for training.

To solve this problem, there are a few synthetic datasets (SynthText, Synth90K, ...) with millions of images. They use traditional methods to render the text boxes (i.e. with no AI) and these methods include many random parameters such as fonts, sizes, or whether there is a border.

Our model, TextBoxGAN, generates text boxes from input words with a GAN, as a new approach to create a synthetic dataset. The main advantage compared to other synthetic datasets is that the texts generated are not constrained by a pre-defined font.