11 Mar 2023

ML Model Telemetry

One requirement in deploying a production instance is to generate logs and telemetry. It allows a better understanding of the #software in the production setting. And the question that comes is, what do we need to log? And how do we store it?

There are a lot of tools and even companies that implement logging and monitoring solutions. And now, ML has become more popular, and integrating new models in a software product has become accepted. The question is, how do we monitor ML models in production?

We can use the current tooling in DevOps. Still, we should consider the labeling activities handled in the Human in the Loop process - like how many images were misclassified by the model? How many times did a human intervene to fix a problem? Some of these questions can be handled by a different suite of tools like tagging and labeling systems, requiring humans to check the outputs manually.

Something worth noting is that these systems require infrastructure and expertise to maintain and manage. That is why in implementing these types of solutions, you will need to establish your team’s skills.

Reference: https://neptune.ai/blog/ml-model-monitoring-best-tools


Tags: