▮ What causes ML Systems to Fail?
Here are the main 2 reasons why a machine learning system fails.
Reference: Designing Machine Learning Systems
▮ Software System Failures
In 2020, two Ml engineers at Google looked through 96 cases of ML pipeline failures inside the company. They found out that 60 out of these 96 cases weren’t ML-specific failures but rather caused by software system failures.
Addressing these failures requires ML engineers to have not only ml-related skills but also traditional software engineering skills. As you can imagine from the case study above, in many cases, ML engineers are not engineering machine learning because of this.
Here are a couple of examples of software system failures.
Ex1. Dependency Failure
When your system breaks when the software package the system relies on breaks. This happens often when the dependency is maintained by a third party.
Ex2. Deployment Failure
When your system doesn’t work as expected due to a deployment error, such as accidentally deploying an older version model, or when your system doesn’t have the appropriate permission to execute a task.
Ex3. Downtime Failure
When your system stops whenever the server, such as AWS, your system relies on stops.
Ex4. Hardware Failure
When your system doesn’t perform as expected due to the hardware where the model is deployed. For example, your model requires intense computation and the CPU where the model is being deployed overheats and crashes.
▮ ML-Specific Failures
Even though most ML system failures are due to non-ml-specific elements, ML-specific failures can be much more dangerous because of their difficulty to detect and fix.
Here are a couple of examples.
Ex1. Production data differing from training data
This may be one of the most sticky causes of ML-specific failures. This happens when the underlying distribution of the training data and the underlying distribution of the data that are inputted into the deployed model are different.
This “data shift” is something all ML engineers have to inevitably face because data is never static.
Ex2. Degenerate feedback loops
First of all, the term “feedback loops” we are considering right now is about having the model make predictions and getting some kind of feedback based on that prediction.
A “degenerate” feedback loop can happen when the prediction itself has some sort of influence on the feedback. This means that the current prediction has some sort of influence on the next prediction which then has some sort of influence on the next prediction and so on.
One example to easily understand this is music recommendation. Whenever an ML model ranks the music, the user has a higher chance of listening to the songs on top of the list, which then makes the model think even more confidently about the recommendations. This is why music songs that are ranked initially high tend to get even more popular while new items struggle to join the top lists.
How do we detect these feedback loops?
One way is to experiment with data divided by their properties like the image below.
If there is an apparent performance difference between data with different properties, there is a high chance that this model will suffer from degenerate feedback loops.
Once you’ve found out there may be a risk of degenerate feedback loops, here are 2 examples of correcting them.
Ex1. Randomization
Instead of only showing the items highly ranked by the ml system, we can randomly insert non-highly-ranked items to find the “true” item rankings.
Ex2. Considering positions
Although randomization helps add diversity, it comes with the risk of losing the user’s interest. Another method is to consider positional information.
If the position of where the items are ranked affects the outcome, why not add another feature as input to store positional information? If an item was chosen despite not being on top of the list, it might mean that the user likes it. This can also help the model learn how much being on top of a list influences how likely an item would be picked.