▮ THE 6 TIPS
There are many things to consider when selecting a Machine learning Model besides performance metrics such as required data quantity, time to train the model, inference speed, etc.
If we compare specific ML algorithms in this post, they would soon be outdated, so here are some tips for selecting a model without getting too much into the specifics.
Reference: Designing Machine Learning Systems
▮ 1. Avoid the “State-of-the-Art” Trap
If there is a newer model, why not use THAT, right?
It sounds cool if we are using “State-of-the-Art” technology.
However, we should always have in mind that “State-of-the-Art” means that it performs better than previous models when trained and evaluated on a specific static dataset.
It does NOT mean that it would perform better with the current dataset available to you. Nor does it mean it can achieve the inference speed you are expecting.
It is important to keep up with new technologies, but what is more important is to be able to find the right solution that can solve the specific problem you are facing. If a traditional simple model can achieve the task with a lower cost, you should probably use that.
▮ 2. Start Simple
Simple is best for almost anything. Even for your machine learning model.
Here are the merits of using a simple model.
1. You can shorten the 「Train⇒Evaluate⇒Deploy⇒Monitor⇒Train」iteration
2. You can gradually add complex components which makes it easier to understand and debug the model
3. You can set the performance of this model as the standard performance
▮ 3. Avoid Human Bias
When an ML engineer is experimenting with multiple models, they might spend more time on some of the models than others. This can happen either accidentally or due to their preferences. It would be unfair to compare each model if the time you’ve spent experimenting is unbalanced.
When comparing multiple architectures, it is important to experiment with the same setup.
For example, if you are going to experiment 10 times on model A, you should probably experiment 10 times on model B as well.
▮ 4. Performance “Now” VS “LATER”
The best-performing model “Now” may not be the best performer later on as well. Maybe, several months later, there would be more data available and you might be able to train a neural network that generates better results.
You can evaluate this by using the “Learning Curve”(Number of training samples as the X-axis, and using the loss, accuracy, etc. as the Y-axis). You can’t tell exactly how much performance gain you can expect, but you can tell whether you can expect any gain at all or not.
▮ 5. Tradeoffs
When selecting a model, there would always be some sort of tradeoffs. Understanding and prioritizing those tradeoffs might help you find the right model for the project.
Ex. 1: Performance Metrics
One of the most famous tradeoffs may be the False Positive/False Negative Tradeoff. When the number of FP decreases, the number of FN would increase, and vice versa.
An example task when FP would be more dangerous is fingerprint unlocking. You wouldn’t want unauthorized people accidentally classified as authorized. In this case, you might want to choose a model that performs relatively low FPs.
On the other hand, an example task when FN would be more dangerous is COVID-19 screening. You wouldn’t want people who have COVID-19 to be classified as not having COVID-19. In this case, you might want to choose a model that performs relatively low FNs.
Ex. 2: Resources
When you use a “State-of-the-Art” model, you might be able to gain better results, but the inference latency may be too high that the user doesn’t want to use it anymore.
In this case, it might be better to avoid the “State-of-the-Art” model and use a more simple one instead.
▮ 6. Understanding Model Assumptions
Every model has some sort of assumption. It is important to understand those assumptions and see whether the currently available data also satisfies those assumptions.
Ex. 1: Boundaries
All linear classifiers assume that decision boundaries are linear
Ex. 2: Normally Distributed
Many statistical methods assume that the data is normally distributed
Ex. 3: Prediction Assumptions
Every model assumes that it is possible to predict Y based on input X.