Skip to content

aayush-k/Scene-Recognition-BoWs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project 4 / Scene Recognition with Bag of Words

This project focused on extrapolating the subject labels of images using bags of images. Apart from the original project requirements, I did the following extra credit:

  1. Used a Gaussian Pyramid Blur Image before vectorizing feature at sigma = 0.25
  2. Implemented k=10 nearest neighbors instead of nearest neighbor
  3. Experimented with many different vocabulary sizes and report performance. E.g. 10, 20, 50, 100, 200, 400, 1000, 10000..
  4. etc.

I found this project to be a really cool application of the NLP concept of Bag of Words, something that I didn't realize could be applied to sift features and image interest points. Below are my results and optimal parameters. Algorithmically I followed the instructions mostly, choosing to do things slightly differently with k nearest neighbors instead of the absolute nearest neighbor as well as applying a Gaussian filter before processing my images.

Tiny Images/Bags of Sift with Nearest Neighbor/1 vs all SVM

Nearest Neighbor:

Tiny Image: 0.194

Bag of Sift: 0.423

1 vs all SVM:

Tiny Image: 0.127

Bag of Sift: 0.661

Paramter Optimization

Below are the parameters I tweaked to achieve optimal results

%Gaussian Pyramid
im = imgaussfilt(im, 0.25);

%SVM Lambda value
lambda = 0.0001;
% 10 =>
% 1 =>
% 0.1 => 0.485
% 0.01 => 0.517
% 0.001 => 0.625
% 0.0001 => 0.622
% 0.00001 => 0.583
% 0.000001 => 0.544

%Vocabulary Size (See table below for confusion matrices)
vocab_size = 200
% 10: Accuracy (mean of diagonal of confusion matrix) is 0.421
% 50: Accuracy (mean of diagonal of confusion matrix) is 0.594
% 100: Accuracy (mean of diagonal of confusion matrix) is 0.623
% 200: Accuracy (mean of diagonal of confusion matrix) is 0.661
% 400: Accuracy (mean of diagonal of confusion matrix) is 0.629
% 1000: Accuracy (mean of diagonal of confusion matrix) is 0.652
% 10000: Accuracy (mean of diagonal of confusion matrix) is 0.686

Results in a table

Vocab Sizes: 10, 50, 100, 1000, 10000

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.661

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.500
LivingRoom

Bedroom

TallBuilding

Bedroom
Store 0.510
InsideCity

LivingRoom

Office

Office
Bedroom 0.460
Office

LivingRoom

Office

Mountain
LivingRoom 0.430
Store

Bedroom

Industrial

Industrial
Office 0.780
Bedroom

Store

Bedroom

Bedroom
Industrial 0.380
Bedroom

Street

Bedroom

TallBuilding
Suburb 0.900
InsideCity

Store

LivingRoom

OpenCountry
InsideCity 0.540
TallBuilding

Store

Office

Office
TallBuilding 0.740
OpenCountry

InsideCity

Coast

Forest
Street 0.750
OpenCountry

Kitchen

TallBuilding

TallBuilding
Highway 0.810
Coast

Store

Coast

TallBuilding
OpenCountry 0.540
Highway

Industrial

Coast

Coast
Coast 0.820
OpenCountry

OpenCountry

OpenCountry

OpenCountry
Mountain 0.800
Kitchen

Store

Suburb

Coast
Forest 0.960
Store

OpenCountry

Mountain

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

About

Scene recognition with bag of words

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published