Stable Diffusion, one of the most popular open source models, is known for its ability to generate highly detailed and creative images based on text p

Reducing the Cost of Pre-training Stable Diffusion by 3.7x with Anyscale

submited by
Style Pass
2024-05-11 00:30:03

Stable Diffusion, one of the most popular open source models, is known for its ability to generate highly detailed and creative images based on text prompts, making it a pivotal tool in AI-driven art and design. However, without solid training infrastructure and expertise, pre-training would take a prohibitively long time with unnecessarily large costs.

In this blog post, we introduce an advanced pre-training solution for Stable Diffusion v2 models, leveraging the power of the Ray and Anyscale Platform to enhance scalability and cost efficiency.  

Stable Diffusion is a conditional generation model that generates high-quality images from textual prompts. Figure 1 illustrates its training pipeline:

Despite the impressive generation quality, baseline pre-trained Stable Diffusion models [1] might not always be suitable for commercial use. One concern involves the pre-training datasets, which could introduce potential biases[2] or contain illegal or copyrighted contents[3, 4]. To guarantee model consistency, fairness, and avoid ethical and legal issues, many organizations choose to pre-train their own models with carefully curated datasets.

Baseline pre-training requires over 200,000 A100 GPU hours on billions of images [5]. This highlights the inherent challenges of pre-training due to its large-scale and computationally intensive nature. There are 3 main factors can severely hinder training efficiency:

Leave a Comment