▮ After Deployment
After deploying our ML models we want to continually update them to be able to adapt whenever the data distribution shifts.
This is why being able to “continually learn” by setting up an infrastructure in a way that allows developers to update a model whenever it is needed and deploy it as soon as possible is critical.
Today I’d like to share how you can execute continual learning and its challenges.
Reference: Designing Machine Learning Systems
▮ Executing Continual Learning
The Workflow
When updating a model, it should never be deployed before being evaluated. The figure below illustrates a simplified workflow to evaluate your model and decide whether to deploy it or not.
- Create a replica of the current model
- Update the replica with new data
- Compare performance
- Replace/Discard depending on the results
Stateful Training / Stateless Retraining
The manner to retrain an ML model can be categorized into 2 types; Stateful Training and Stateless Retraining
Stateless Retraining
Most companies adopt this approach where the model is trained from scratch whenever it is updated.
This approach requires developers to store all existing data.
Stateful Training
Stateful training allows developers to update your model with less data. Grubhub found out that stateful training allows the ML model to converge faster and requires less computational power.
Another advantage of this method is that it makes it possible to avoid storing all existing data altogether.
When taking this approach, there are mainly 2 types of model updates.
- Model iteration
Updating by changing the model architecture or adding new features. - Data iteration
Updating by refreshing the model with new data without any change in model architecture.
As of today, most cases adopt the latter.
▮ Challenges For Continual Learning
Even though continual learning has achieved many huge successes, it still has many challenges. Here are the main 2.
Challenge 1: Getting Fresh Data
If you want to update your model every day, you’ll need to be able to acquire new data AND label it every day as well. In most cases, this labeling process becomes the bottleneck. This means continual learning can be hard for tasks that take a long time to get feedback.
Challenge 2: Evaluation
You should know that the more frequently you update the model the more chances to update it in the wrong direction leading the system to fail.
To avoid this, it is important to thoroughly evaluate your model to ensure its performance and safety.
If you are interested, you can go check this blog post where I discussed how to evaluate your model before production.