Machine Learning As a Service (MLaaS) commoditizes the fruits of expensive research and model training via APIs that give customers access to insights

MLaaS: Preventing API-Driven Model Theft With Variational Autoencoders

submited by
Style Pass
2021-07-13 12:00:17

Machine Learning As a Service (MLaaS) commoditizes the fruits of expensive research and model training via APIs that give customers access to insights from the system. Though the reasoning of the system is inevitably revealed to some extent through these transactions, the core model architecture, the weights that define the utility of the model, and the specific training data that made it useful are jealously guarded for several reasons.

Firstly, the framework is likely to have exploited a number of free or open source (FOSS) code repositories, and potential rivals could trivially do likewise in pursuit of the same ends; secondly, in many cases the weights used by the models represent 95% or more of the model’s ability to interpret training data better than rival models, and arguably constitute the core value of expensive investment, both in terms of research hours and high-scale, well-resourced model training on industry-grade GPUs.

Also, the mix of proprietary and public-facing data behind the model’s training dataset is a potentially incendiary matter: where the data is ‘original’ work obtained through costly methods, the ability of an API user to infer the data structure or content through API-permitted requests could allow them to essentially reconstruct the value of the work, either by understanding the schema of the data (allowing for practical reproduction) or by reproducing the weights that orchestrate the features of the data, which potentially allows for reproduction of an ’empty’ but effective architecture into which subsequent material could be usefully processed.

Leave a Comment