382. Continual Learning

▮ After Deployment

After deploying our ML models we want to continually update them to be able to adapt whenever the data distribution shifts.

This is why being able to “continually learn” by setting up an infrastructure in a way that allows developers to update a model whenever it is needed and deploy it as soon as possible is critical.

Today I’d like to share how you can execute continual learning and its challenges.

Reference: Designing Machine Learning Systems

▮ Executing Continual Learning

The Workflow

When updating a model, it should never be deployed before being evaluated. The figure below illustrates a simplified workflow to evaluate your model and decide whether to deploy it or not.

  1. Create a replica of the current model
  2. Update the replica with new data
  3. Compare performance
  4. Replace/Discard depending on the results
Fig.1 – Continual Learning Process

Stateful Training / Stateless Retraining

The manner to retrain an ML model can be categorized into 2 types; Stateful Training and Stateless Retraining

Fig.2 – Stateful Training and Stateless Retraining
Stateless Retraining

Most companies adopt this approach where the model is trained from scratch whenever it is updated.
This approach requires developers to store all existing data.

Stateful Training

Stateful training allows developers to update your model with less data. Grubhub found out that stateful training allows the ML model to converge faster and requires less computational power.

Another advantage of this method is that it makes it possible to avoid storing all existing data altogether.

When taking this approach, there are mainly 2 types of model updates.

  • Model iteration
    Updating by changing the model architecture or adding new features.
  • Data iteration
    Updating by refreshing the model with new data without any change in model architecture.

As of today, most cases adopt the latter.

▮ Challenges For Continual Learning

Even though continual learning has achieved many huge successes, it still has many challenges. Here are the main 2.

Challenge 1: Getting Fresh Data

If you want to update your model every day, you’ll need to be able to acquire new data AND label it every day as well. In most cases, this labeling process becomes the bottleneck. This means continual learning can be hard for tasks that take a long time to get feedback.

Challenge 2: Evaluation

You should know that the more frequently you update the model the more chances to update it in the wrong direction leading the system to fail.

To avoid this, it is important to thoroughly evaluate your model to ensure its performance and safety.

If you are interested, you can go check this blog post where I discussed how to evaluate your model before production.