Brief problem statement:
Hi! This project was created for the bitgrit competition for "Generative AI" (https://bitgrit.net/competition/18#). We must categorise the photos as real (label '0') or artificially created (label '1'). The datasets contain just 20 × 20 x 3 (channels) sized photos. Since the underlying information has already been altered, it cannot be changed back to the original images. The objective of this challenge is to forecast the 'labels' column using the supplied data.
Classifiers and techniques used
- *PCA(*Principal Component Analysis)
- Naive Bayes Classifier
- Softmax Classifier
- Support Vector Classifier
- Adaboost
- GradBoosting
- Random Forest
- Voting Classifier
-
After loading the data, preprocessed it by changing all features in the range of 0 to 1 using the MinMaxScaler.
-
Divide the test and train sets of the training data (1:4).
-
3 different classifiers were applied to the data, and the accuracy was evaluated.
-
Following that, I proceeded to apply Principal Component Analysis (PCA) to the data and evaluated its accuracy using the aforementioned classifiers.The outcome of this analysis is as follows:
-
The Support Vector Classifier (SVC) demonstrated the most favourable outcomes, displaying the highest performance among the classifiers used. Through meticulous adjustment of the number of components, it was determined that when set to 208, the Linear Kernel achieved a prediction accuracy of 86.47619047619047%.
-
PCA+SVC Linear performed pretty well, with a f1 score of 0.7968 .
-
Subsequently, I explored more advanced techniques such as bagging, boosting, and other methods in an attempt to enhance the accuracy even further.
-
Initially, I employed the Bagging technique, which resulted in an f1 score of 0.85588235.
-
Next, I experimented with GradBoosting. However, when using lower learning rates, the accuracy did not meet expectations. On the other hand, when using higher learning rates, the model demonstrated overfitting, with the validation set showing significantly poorer accuracy compared to the testing set. Consequently, I decided not to include this approach in the final submission.
-
Lastly, I employed AdaBoosting with SVC as the base classifier, considering its superior accuracy from previous attempts. Notably, this approach yielded the highest accuracy thus far, achieving an f1 score of 0.87572254.