Products and services powered by machine-learning models need training data that is often obtained from customers. However, this creates a frustrating

No-data ML. No-data ML allows models to obtain… | by Ilia Zintchenko | Ntropy | May, 2021 | Medium

submited by

Style Pass

2021-05-21 09:00:07

Products and services powered by machine-learning models need training data that is often obtained from customers. However, this creates a frustrating cycle for innovation, where a good product needs a good model, which in turn needs lots of data coming from customers who need a good product. There are, of course, creative ways to bootstrap a product into the market — buying data elsewhere, using heuristics, starting with pre-trained models, etc. The data problem is, nevertheless, far too familiar to any machine-learning team and is one of the key deterrents against using machine-learning in a commercial setting. What if we could have “no-data ML”, where training data is obtained externally, in a scalable way, and machine-learning models are deployed to production from day one, without requiring any in-house labels?

In our previous post, we outlined our master plan for a data network. At scale, it will allow machine-learning models to query collections of datasets distributed across organizations, providing a way to grow the amount of data available to commercial machine-learning models over time and allowing anyone to deploy a model without bringing their own data. Below, we will discuss some of the components of this network and introduce our first product built on top of it.