▮ Error Analysis
Error analysis is a process of examining the dev set that your ML model misclassified to understand the underlying causes of error. This can help you decide what to prioritize and the direction where the project should go.
Here are the procedures to apply error analysis.
▮ Evaluating Multiple Ideas
Let’s say your team is developing an ML model which classifies dogs. However, the classifier is not performing as expected. So the team came up with the following ideas for why this is happening.
- Some cats were recognized as dogs
- The images were blurry
- Some dogs were partially visible, such as being behind an object
To do manual error analysis, all you need to do is create a spreadsheet like below for a certain amount of dev set samples.
By looking at the table, you can tell that fixing idea 1(cats being recognized as dogs) can only eliminate 3% of the errors the most. However, fixing ideas 2(images being blurry) and 3(partially hidden) could help eliminate almost all the errors. Therefore helping you to decide to focus on the latter two categories.
▮ Cleaning Up
During the analysis, you may find new possible factors that are affecting the results such as mislabelling by human annotators.
If you think that the fraction is quite large, insert another column to the spreadsheet and keep track of them as well. You should note that whatever processes are being applied, make sure to apply them to both the dev set and the test set so that both would continue to be drawn from the same distribution.
▮ When You Have A Large Dev Set
In some cases, manually checking the dev set can take too much time due to the number of samples. What you can do is split your dev set into an EYEBALL DEV SET and a BLACKBOX DEV SET.
Eyeball Dev Set
Randomly select a small portion of the dev set and use this dataset to do error analysis.
The required size of this dataset depends on the task, so here are some guide lines.
- ~20 mistakes on eyeball dev set: Rough sense of the major error sources
- ~50 mistakes: Good sense of the major error sources
- ~100 mistakes: Very good sense of the major error sources
Considering the above, ideally, if you want to find about 100 mistakes in this dataset and your model has a 3% error rate, you would need approximately 3333(100/0.03) samples.
The lower the error rate, the more data you’ll need.
If you have a small dev set use all the samples as the eyeball dev set.
Blackbox Dev Set
The samples that weren’t selected as the eyeball dev set will be the black box dev set. This dataset should be avoided from being exposed to human eyes. You can only use this dev set to evaluate the model’s error rate automatically to prevent the model to over-fit to the eyeball dev set.