Model monitoring in large-scale applications may be challenging due to high volume of predictions that need to be analyzed with low issue detection time.
Production machine learning models should be monitored for data and model issues such as data anomalies and drift. I discuss why in my other blog post. Design and properties of the monitoring system mainly depend on the use case. There are also different ways to implement such model monitoring system. In this article I discuss two main implementations. I call them engineering and data science approach respectively. But first I'll need to introduce a few common concepts.
Model serving is usually implemented as a web server, which exposes a REST API for model inference/prediction. Typically, it loads and runs the model file using the same machine learning library the model was trained with, for instance TensorFlow or PyTorch.
In either case, the input and out data for model prediction/inference, e.g. input image and output probability, are usually saved to be processed later for model performance analysis, labeling, retraining, etc. This is a part of model development and improvement process.