YaFSDP is up to 20% faster for pre-training LLMs and performs better in high memory pressure conditions. It is designed to reduce communications and m

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-06-11 10:00:06

YaFSDP is up to 20% faster for pre-training LLMs and performs better in high memory pressure conditions. It is designed to reduce communications and memory operations overhead.

Notice that both examples require a Docker image, which can be built using docker/build.sh script. The image is based on the NVIDIA PyTorch image with some patched 🤗 libraries. Patches for the libraries can be found in the patches folder.

Leave a Comment