In an ideal world, it's best to run as much as you can on a single GPU. The main limitation here is the amount of memory available. Our productio

Fractional GPUs & GPU sharing

submited by
Style Pass
2024-02-28 17:30:07

In an ideal world, it's best to run as much as you can on a single GPU. The main limitation here is the amount of memory available. Our production Stable Diffusion uses 13GB of VRAM when first loaded, and gets close to 20GB when creating two 1024x1024 images. We used to run this on a whole A100 40GB and just accepted the inefficiency, but now offer the perfect solution: Fractional GPUs.

We use the MIG technology to break apart GPUs. This is a new method and as such is only supported by A100s in our GPU offering (H100s coming soon).

Our goal is to make sure you can access the latest technology possible without needing to get into the weeds of how it works. To use this technology you just need to change the accelerator field in your pipeline.yaml, here's an example for the 5GB A100 chunk:

Fractional GPUs are a great way to save costs on your project, they also have much lower cold starts than a whole GPU and due to this we offer no low volume premium and have a completely flat time based pricing:

Leave a Comment