LakeSail is thrilled to unveil a preview release of Sail 0.2, our latest milestone in the journey to redefine distributed data processing. With a high-performance, Rust-based implementation, Sail 0.2 takes another bold step in creating a unified solution for Big Data and AI workloads. Designed to remove the limitations of JVM-based frameworks and elevate performance with Rust’s inherent efficiency, Sail 0.2 builds on our commitment to support modern data infrastructure needs—spanning batch, streaming, and AI.
In an era where powerful single machines can process substantial data volumes, the question arises—why distributed processing? Many applications require data handling that goes far beyond what isolated hardware can achieve. Distributed processing enables scalability across multiple nodes, fault tolerance, and optimized resource allocation, all crucial for handling diverse and dynamic workloads. This approach supports efficient and resilient workflows, especially for businesses managing real-time and geographically dispersed data needs.
The architecture of Sail 0.2 is built on a separation between control and data planes, allowing for fine-grained resource management and optimized data movement. For the control plane, we use gRPC alongside the actor model, creating a robust system that manages distributed workflows in our framework. The data plane leverages both gRPC and the Arrow IPC protocol, establishing an efficient pipeline for shuffle data within the cluster. Support for cloud storage APIs for remote shuffle data is planned for future releases.