Checkpoint and restore functionality for CUDA is exposed through a command-line utility called cuda-checkpoint. This utility can be used to transparen

Checkpointing CUDA Applications with CRIU

submited by
Style Pass
2025-08-08 10:00:06

Checkpoint and restore functionality for CUDA is exposed through a command-line utility called cuda-checkpoint. This utility can be used to transparently checkpoint and restore CUDA state within a running Linux process. Combine it with CRIU (Checkpoint/Restore in Userspace), an open-source checkpointing utility, to fully checkpoint CUDA applications.

Transparent, per-process checkpointing offers a middle ground between virtual machine checkpointing and application-driven checkpointing. Per-process checkpointing can be used in combination with containers to checkpoint the state of a complex application, facilitating use cases such as the following:

CRIU (Checkpoint/Restore in Userspace) is an open-source checkpointing utility for Linux, maintained outside of NVIDIA, which can checkpoint and restore process trees. 

CRIU exposes its functionality through a command line program called criu and operates by checkpointing and restoring every kernel mode resource associated with a process. These resources include:

Leave a Comment
Related Posts