At HotChips 2024, Tesla announced the open-sourcing of the Tesla Transport Protocol over Ethernet (TTPoE), represented on this GitHub repo. Tesla also

Search code, repositories, users, issues, pull requests...

submited by
Style Pass
2024-09-23 02:00:05

At HotChips 2024, Tesla announced the open-sourcing of the Tesla Transport Protocol over Ethernet (TTPoE), represented on this GitHub repo.

Tesla also announced joining the Ultra Ethernet Consortium (UEC) to share this protocol and work to standardize a new high-speed/low-latency fabric (be that TTPoE or otherwise) for AI/ML/Datacenters -- desiring a non-proprietary, low cost, distributed congestion control, standard EthernetII frame, and non-centralized interconnect protocol to commoditize and accelerate technical progress.

In TTPoE, just like TCP, dropped packets and replays are the acceptable default behavior, yet full transmission is guaranteed.

TTPoE's initial deployment was for the Tesla Dojo v1 project, where the protocol executed entirely in hardware and deployed to a very large multi-ExaFlops (fp16) supercomputer with over 10s of thousands of concurrent endpoints. This protocol does not need a CPU or OS to be involved in any way to link and execute.

If you came here to be impressed by something complex and clever, you won't be. The protocol is designed on basic fundamentals -- simple transport and to the point. Ethernet transport in essence is only intended to move data from point A to B and should be limited by physics -- not software execution time. Centralized congestion management of extremely large scale machines (just like the internet) is a fool's errand -- each endpoint should be resiliant and self-managing.

Leave a Comment