Meet Recraft V3 - the only model in the world that can generate images with long texts, as opposed to just one or a couple of words. Our team trained a new SOTA model from scratch and set a new standard for excellence in image generation. Over the past week Recraft V3 participated in the Hugging Face’s industry-leading Text-to-Image Model Leaderboard by Artificial Analysis. It secured #1 place with ELO rating of 1172. Recraft's new model is showing quality higher than models of Midjourney, OpenAI, and all other major image generation companies.
If you've ever tried to generate text on an image, you’ve likely faced significant challenges. Of course, image generators are improving, but rendering anything longer than a few words on an image is still frustratingly hard. Before we dive into the technical details of our implementation, let’s discuss why this is so challenging.
Let’s look at an example generated by Recraft’s previous image generation model Recraft 20B (with prompt: “a cat with a sign 'Recraft generates text amazingly good!' in its paws”):