▮ Image Classification
When doing image classification tasks, there are mainly 2 approaches; Traditional classification and Metric Learning.
Here are the differences between the two approaches.
▮ Traditional Classification
The traditional classification approach classifies images by outputting the probability for each class and getting the class with the highest probability. This is usually done by taking the following steps.
- Extract Features
- Flatten to 1d array (Also called an “Embedding Vector”)
- Output the probability of each class
- Use ArgMax to get the class with the highest probability as output.
The procedures for 1 and 2 may slightly differ depending on the architecture, such as convolutions and transformers.
The figure below illustrates the steps when using convolutions for feature extraction.
▮ Metric Learning
The metric learning approach classifies images by embedding images into a “feature space” and outputs the class where the distance of their embedding is closer than a certain threshold. This often takes the following steps.
- Extract Features
- Flatten to 1d array (Also called an “Embedding Vector”)
- Compare the distance between other embedded vectors
- Get the closest embedding
- Output the class which corresponds to that embedding vector
The figure below illustrates the steps when also using convolutions for feature extraction.
Compared to the traditional image classification model, here are the pros and cons of this method.
Pros
- Only require data that are similar/dissimilar to the object you want to classify, therefore not requiring specific class labels
- Requires much fewer data
Cons
- Can take longer to train/predict due to computational complexity
- Sensitive to noise due to measuring distances.
- Limited scalability because the model has to compare distance to each data
- Difficult to tune hyperparameter