detr-torch

Object Detection using Transformers

Usage:

git clone https://github.com/gittygupta/detr-torch.git
cd detr-torch && mkdir saved_models
Download any of the models from drive
Model Nomenclature: detr_(Epoch Number).pth
Experimental results: detr_4.pth and detr_6.pth work best
Save the model to the folder saved_models
python inference.py --model detr_{epoch_number}.pth --folder {path/to/images}

Single instance usage:

from config import *
from inference import *
from model import DETR

model_path = 'path/to/model.pth'
model = DETR(num_classes=num_classes,num_queries=num_queries)
model.load_state_dict(torch.load(model_path)) 

image = cv2.imread('path/to/image.jpg')
transformed_image = transform(image)
confidences, bboxes = run_inference_for_single_image(image, model, torch.device('cuda'))
bboxes = scale_bbox(image.shape[1], image.shape[0], bboxes)

output_image = draw(image, confidences, bboxes, 0.5)
cv2.imwrite('path/to/save/image.jpg', output_image)

Comparison:

The current SOTA object detection is done by Google's EfficientDet. Due to hardware constraints, EfficientDet-D1 has been used, which has 6.6M parameters. The Transformer (odd 17M parameters) on the other hand uses ResNet50 as the backbone (odd 23M parameters) with a total of 41M parameters. The results are as follows:

The image on the left is the output of the Transformer and the one on the right is from EfficientDet-D1. We can see that the EfficientDet has an overlap of bounding boxes, whereas the Transformer doesn't, because of how the attention layer works. EfficientDet and other traditional object detection algorithms (MobileNet, YOLO) need Non-Max Suppression (NMS) to remove the overlaps. That is needed because of unstable confidence values, which do not exist in Transformers, hence does not require NMS.

Also, tested on a NVIDIA GTX 1650 Max-Q (4GB) GPU, the EfficientDet-D1 Model runs at 4-5 FPS, whereas DETR runs at 12-15 FPS, even after having much higher number of parameters, all due to the elimination of NMS.

Thus, the transformer architecture is able to provide a boost in speed and also a stability in the confidence of prediction.

More Comparisons:

Above, it can easily be seen that the transformer has a higher accuracy, since EfficientDet is not even able to detect the object

In all the above comparisons, the confidence level for both the models was set to 0.5

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
dataset		dataset
detr		detr
efficientdetd1_samples		efficientdetd1_samples
samples		samples
config.py		config.py
data_utils.py		data_utils.py
inference.py		inference.py
model.py		model.py
readme.md		readme.md
report.pdf		report.pdf
train.py		train.py
utils.py		utils.py
xml_to_csv.py		xml_to_csv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

detr-torch

Usage:

Single instance usage:

Comparison:

More Comparisons:

About

Releases

Packages

Languages

gittygupta/detr-torch

Folders and files

Latest commit

History

Repository files navigation

detr-torch

Usage:

Single instance usage:

Comparison:

More Comparisons:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages