Machine learning (ML) is a foundation underlying nearly every facet of Stripe’s global operations, optimizing everything from backend processing to

Shepherd: How Stripe adapted Chronon to scale ML feature development

submited by

Style Pass

2024-04-16 19:30:16

Machine learning (ML) is a foundation underlying nearly every facet of Stripe’s global operations, optimizing everything from backend processing to user interfaces. Applications of ML at Stripe add hundreds of millions of dollars to the internet economy each year, benefiting millions of businesses and customers worldwide. Developing and deploying ML models is a complex multistage process, and one of the hardest steps is feature engineering.

Before a feature—an input to an ML model—can be deployed into production, it typically goes through multiple iterations of ideation, prototyping, and evaluation. This is particularly challenging at Stripe’s scale, where features have to be identified among hundreds of terabytes of raw data. As an engineer on the ML Features team, my goal is to build infrastructure and tooling to streamline ML feature development. The ideal platform needs to power ML feature development across huge datasets while meeting strict latency and freshness requirements.

In 2022 we began a partnership with Airbnb to adapt and implement its platform, Chronon, as the foundation for Shepherd—our next-generation ML feature engineering platform—with a view to open sourcing it. We’ve already used it to build a new production model for fraud detection with over 200 features, and so far the Shepherd-enabled model has outperformed our previous model, blocking tens of millions of dollars of additional fraud per year. While our work building Shepherd was specific to Stripe, we are generalizing the approach by contributing optimizations and new functionality to Chronon that anyone can use.