This morning, we tore down the servers for our XetHub product after 658 days in production. It’s as good a time as any to reflect on what we learned.
From the start, our team’s mission was to “make collaboration on big data delightful” and we launched a product to meet that challenge. Our initial offering targeted ML teams, and was developed around three basic tenets:
Git for ML collaboration: Much like Git for code, once something is stored, we should only need to store the changes between versions - even for binary files. Since developers already upload, download, and work together with Git-based tooling, we can simply extend Git to work with larger ML files for full reproducibility and provenance.
Flexible file support: ML data and models come in all shapes and sizes. By adopting file-system semantics in our architecture and supporting all file types, we have the ability to optimize by file type to improve performance.
Instant reads, faster writes: Use file-system mounts to stream data instead of always downloading it, making access almost instant and saving lots of compute hours. Deduplicating changes (like Git) would make iterative writes faster, avoiding re-uploading things that haven’t changed.