Winning an internal competition on imbalanced image classification. The competition took place between November, 15th 2021 and December, 15th 2021.
The competition is about imbalanced image classification. The dataset is a subset of Google Quick, draw dataset. Images are black and white drawings of 28x28 pixels.
For the competition, the dataset is restricted to 100 classes. The first class has 1,000 images, the second has 990, the third has 980, ..., the last class has only 10 images ! The test set is a balanced set of 100,000 images. The goal is to reach the best accuracy score on the test set. The competition took place between November, 15th 2021 and December, 15th 2021.
The dataset for the competition can be found here.
If you want to test your model, the solution is here.
To encourage participants to make the best of their knowledge, some rules were added:
- People can group into teams of 3 people or less
- Pre-trained models for fine-tuning are forbidden
- Only the provided dataset can be used
- One team can only submit up to 10 times during the competition
This competition was organized by Automatants - the AI student organization of CentraleSupélec in order to promote Deep learning and to compete with each other. Many thanks to Thomas Lemercier for organizing this competition and to Loïc Mura for hosting the competition.
-
- Data Augmentation
-
- Architectures of CNNs:
-
- Base CNN
-
- Resnet
-
- Mobilenet v2
-
- Regularization
-
- Weight Decay
-
- Label smoothing
-
- Dropout
-
- Classic dropout
-
- Spatial dropout
-
- Stochastic Depth
-
- Early Stopping
-
- Reduce LR on plateau
-
- Feature extractor + Classifier
-
- Mobilenet + LDA
-
- Mobilenet + Gradient Boosting
-
- Semi-supervised training
-
- Few shot learning
-
- Ensemble
-
- Vote
-
- Weighted vote
-
- Meta Learner
-
- Distillation
-
- Weighted loss
I mainly used basic data augmentation to limit class imbalance influence on training. Advanced data augmentation techniques such as cutmix, random erasing or mixup seemed less adapted to this problem and harder to implement.
Reference:
- The Effectiveness of Data Augmentation in Image Classification using Deep Learning (Dec 2017)
- mixup: Beyond Empirical Risk Minimization (Oct 2017)
- Random Erasing Data Augmentation (Aug 2017)
- CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (Aug 2019)
As I did not have a very powerfull hardware, I had to use small networks. Thus, I mainly used architectures with Mobiletnet v2 modules.
Reference:
- Deep Residual Learning for Image Recognition (Dec 2015)
- MobileNetV2: Inverted Residuals and Linear Bottlenecks (Jan 2018)
All my models were overfitting so I tried many regularization techniques to limit that: Weight decay, Label smoothing and Dropout.
Reference:
- When Does Label Smoothing Help? (Jun 2020)
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting (2014)
- Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift (Jan 2018)
- Deep Networks with Stochastic Depth (Jul 2016)
As more basic Machine learning techniques are often more robust to overfitting and data imbalance, I tried to used features generated by a Mobilenet and classify them using other Machine learning algorithms. I tried to use Linear discriminant analysis and Gradient boosting but the accuracies were nowhere near the regular Mobilenet. An explanation might be that Mobilenet reached a state through a global optimization by exploring the space of the solution in a continuous way. The same conclusions might be too hard to reach with transfer learning.
Reference:
Looking for more data is forbidden and the training set only contains 50,500 while the test set has 100,000 samples. This happens a lot in the real world as labelizing data costs a lot. Semi-supervised learning takes advantage of this unlabeled data.
Noisy student learning consists in training a CNN with a noisy training (Dropout, data-augmentation, ...). This trained CNN then generates pseudo-labels on the unlabeled data. A threshold is applied to only keep predictions where the confidence is high. Another CNN is trained with the additional pseudo-labeled data.
Reference:
Few shot learning corresponds to the challenge of classifying images with very few training data. I did not have enough time to make it work but it would definitly be worth to investigate FSL. A way to use FSL would have been to use top K predictions by mobilenet, feeding it to a FSL algorithm and do an ensemble method on both predictions.
Reference:
As a last way to improve my models, I used ensemble methods to combine models. I started with a simple vote among my best models and this constitued my best model. I also tried to combine predictions with a meta-model but that was no better.
To limit class imbalance, I could have used a weighted loss instead of augmenting my data until it is balanced. I would have saved some memory and some time but it was not really limiting.
My best model was a ensemble of six regular Mobilenets trained with Noisy student learning.
I started learning about Machine Learning one year ago. How far can I go now ? I wanted to put all my efforts in improving my understanding of CNNs, even if it meant taking more time to read paper than fine-tuning.
Surprisingly, the architecture did not have a great influence on the accuracy. However, using Mobilenet v2 was definitly a great choice to have a light weight model which could be trained much faster than any other models.
Regularization was the key of the competition. As my models were always overfitting, regularization techniques helped me a lot to increase accuracy without overfitting.
Ensemble methods are powerful tools as well. If I had the time I would have tried to use distillation on my ensemble model as it is much more satisfaying to have a good accuracy with smaller models.
I also had the occasion to try more original methods such as Semi-supervised training with Noisy student learning, Few shot learning and LDA/Gradient Boosting with a deep feature extractor. I was sort of mitigated between trying something new with low chances to improve my score and trying to find better hyperparameters. I am glad to have chosen to try as many different approaches as possible.