At Modal, we’re obsessed with cold start latency. Earlier this year, we introduced memory snapshots to slash startup times by more than half. Today,

GPU Memory Snapshots: Supercharging Sub-second Startup

submited by
Style Pass
2025-07-31 16:30:06

At Modal, we’re obsessed with cold start latency. Earlier this year, we introduced memory snapshots to slash startup times by more than half. Today, we’re thrilled to announce the next evolution: GPU memory snapshots—bringing the same checkpoint/restore magic to GPU-accelerated workloads.

Our distributed file system uses a series of caches to store, directly in the worker memory, the most popular files used across Modal users. This is great because, for example, if torch is imported in one program, another program benefits because the torch files are now in the worker cache. This has a substantial impact in performance, usually 3-5x faster than when downloading files without a cache.

The lifecycle of a Modal Function involves a few stages: container cold boot and running inputs. Cold boot most commonly means two things: downloading your program files and reading your program into memory.

Reading a program into memory and starting up a Function takes time—sometimes a lot of time! What if we could take the memory representation of your program and save it into an image? That could save time by skipping reading files and re-creating your program in memory on every cold boot.

Leave a Comment
Related Posts