Metal FlashAttention underpins Draw Things’ claim of fastest image generation inside the Apple ecosystem. It conserves system memory, it is fast and

Metal FlashAttention 2.0: Pushing Forward On-Device Inference & Training on Apple Silicon

submited by
Style Pass
2025-01-07 18:30:04

Metal FlashAttention underpins Draw Things’ claim of fastest image generation inside the Apple ecosystem. It conserves system memory, it is fast and it supports a wide-array of devices with the oldest being iPhone 12 from more than 4 years ago.

Back in September, Philip Turner and I released Draw Things with Metal FlashAttention 2.0. Since then, we’ve integrated not only the forward pass (useful for inference) but also the experimental backward pass (useful for training). Combining together, Draw Things is the only efficient application on macOS / iOS that supports both inferencing and fine-tuning FLUX.1 [dev], a 11B-parameter, state-of-the-art image generation model. This major version upgrade delivers:

Translating these gains into real-world numbers, we see up to 20% improvement on inference for FLUX.1 models on M3 / M4 devices, up to 20% improvement on inference for SD3 / AuraFlow models on M3 / M4 devices. Similar improvements for SD3 / AuraFlow for older hardware and around 2% improvement on older hardware for FLUX.1 models.

Compared to other implementations, FLUX.1 integrated inside Draw Things is up to 25% faster than mflux implementation on M2 Ultra for each iteration, and more for end-to-end times; it is up to 94% faster than ggml implementations (also known as gguf format). SD Large 3.5 integrated inside Draw Things is up to 163% faster than DiffusionKit implementation for each iteration (on M2 Ultra).

Leave a Comment