381. ML Model Monitoring

▮ ML-Specific Metrics

Here are the main 4 ML-specific metrics to monitor after you’ve deployed your model.

Fig.1 – The 4 Metrics

Accuracy-related Metrics

You should always log and track any type of user feedback. If you’re at the phase where you are about to deploy your model you should consider HOW you can get those feedback. A feedback log can’t be used to infer natural labels, but it can be used to detect changes in model performance.

Predictions

Predictions may be the most used metrics to monitor.

Predictions are not only easier to interpret and summarize, but in most cases, they are low dimensional which often also means it is easier to monitor.

Assuming that the function which maps the input to the output doesn’t change, if the distribution of the predictions shifts, there is a high chance that there is also a shift in input distribution.

Features

By monitoring features(For example, columns in tabular data), you can ensure that your features are following the expected schema. Here are a couple of examples.

  1. Check if the values of a feature in within a certain range
  2. Check if the values of a feature are always smaller than the values of other features.
  3. Check if there are no NaNs(not a number).

There are many open-source libraries to do these processes(feature validation) such as Evidently and Deepchecks.

Raw Inputs

Raw inputs refer to data before being processed for model inference. You can monitor this artifact to check if the input itself has changed or not.

However, compared to other monitoring metrics, raw inputs can become hard to monitor because in most cases, they do not follow a specific schema.

Moreover, most ML workflow setups make it almost impossible for ML engineers to have direct access. Therefore, this monitoring process should be handled by the data platform team.