This project focused on extrapolating the subject labels of images using bags of images. Apart from the original project requirements, I did the following extra credit:
- Used a Gaussian Pyramid Blur Image before vectorizing feature at sigma = 0.25
- Implemented k=10 nearest neighbors instead of nearest neighbor
- Experimented with many different vocabulary sizes and report performance. E.g. 10, 20, 50, 100, 200, 400, 1000, 10000..
- etc.
I found this project to be a really cool application of the NLP concept of Bag of Words, something that I didn't realize could be applied to sift features and image interest points. Below are my results and optimal parameters. Algorithmically I followed the instructions mostly, choosing to do things slightly differently with k nearest neighbors instead of the absolute nearest neighbor as well as applying a Gaussian filter before processing my images.
Tiny Image: 0.194
Bag of Sift: 0.423
Tiny Image: 0.127
Bag of Sift: 0.661
Below are the parameters I tweaked to achieve optimal results
%Gaussian Pyramid
im = imgaussfilt(im, 0.25);
%SVM Lambda value
lambda = 0.0001;
% 10 =>
% 1 =>
% 0.1 => 0.485
% 0.01 => 0.517
% 0.001 => 0.625
% 0.0001 => 0.622
% 0.00001 => 0.583
% 0.000001 => 0.544
%Vocabulary Size (See table below for confusion matrices)
vocab_size = 200
% 10: Accuracy (mean of diagonal of confusion matrix) is 0.421
% 50: Accuracy (mean of diagonal of confusion matrix) is 0.594
% 100: Accuracy (mean of diagonal of confusion matrix) is 0.623
% 200: Accuracy (mean of diagonal of confusion matrix) is 0.661
% 400: Accuracy (mean of diagonal of confusion matrix) is 0.629
% 1000: Accuracy (mean of diagonal of confusion matrix) is 0.652
% 10000: Accuracy (mean of diagonal of confusion matrix) is 0.686
Accuracy (mean of diagonal of confusion matrix) is 0.661