-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] YOLO v5 #48
Comments
Hi @oke-aditya , Recently, I refactor ultralytics's yolov5 following the philosophy of torchvision's here. Now the inference part looks similar to torchvision's retinanet. The inference part is done. And I plan to use quickvision's API ( |
If we totally use |
True, but that reduces codebase significantly to maintain. I would love a custom implementation though, it would need some testing and weights etc. |
Just saw your implementation. That's great. Really nice and so complete. |
And the difference of the model weights between my implementation and ultralytics's is just some keyname, you can check the model weights converting script here. |
That's absoltely fine!. |
True, Ultralytics use the And this is the only one difference between ultralytics's and my. |
I will try once locally by using There are few problems which I forsee while doing using hub, but I want to verify them once. |
Though |
Hopefully to see your experiment result here. |
Sure, I will share a Colab so that we can dig into their implementation. |
Just a doubt. FRCNN and RetinaNet are not differentiable in
Will not work. But it will work for Detr. Which is differentiable in both I would like to know For YOLO what are your thoughts? |
The But in For yolov5, we can choose either ways, No rules of thumb I think? |
No Thumb rule here. Best to leave it out, as I think it will be great to keep same interface in The idea here is While Sadly we can't do this for FRCNN and RetinaNet as you mentioned the Let me know your thoughts ! |
Here is a discussion of the |
Both Sometimes dataset is huge and I think for detection users should simple do in a |
Yep, the mechanisms in |
One advantage of doing PostProcess is that it enables to convert from YOLO to Pascal VOC. from which we can compute I think if we too do PostProcess internally and are able to compute these metrics, it wold be nice. Detr PostProcess returns
If we can stick with these output formats for all models in both Sadly we can't do this for torchvision models as this is coupled with the model defintion. Hence the So we can return
What I would propose is in Also, if the user would like very specific post processing, he can simply instantiate the model and continue on his own. Thoughts? |
Just a doubt here, Compared to |
I think Nested Tensor Utils can be re-used from our utils. |
Hi @oke-aditya Here is a slightly abstract model graph visualization comparing vs Check for more details in my notebooks |
Was bit away for this weekend, I will have a look definitely. |
Just a quick update @zhiqwang . |
@oke-aditya Sure, and we can follow |
Hi @oke-aditya , It seems that |
Yes, release is today 😄 We will come back to this. |
Hi @oke-aditya , there are some bugs in loss computation in my |
Very nice. Till then I will make layers API which will make porting YOLO easier. |
Hi @oke-aditya , The yolov5 now can be used for training, checkout zhiqwang/yolort#25 to see the details, but there are some hidden bugs in the master branch, I will add more unittest soon. |
Superb. I have been really busy this month. I will get back to working at end of month. |
🚀 Feature
Implement YOLO v5 from
torch.hub
.This library removes such
dataset
abstraction and aims to provides a clean modular interface to models.Some key points to note: -
train_step
,val_step
,fit
API and a lightning trainer. Datasets, augmentations, transforms are not needed.Note that none of quickvision models can achieve SOTA, limitations being torchvision's implementations and not using transforms/datasets. But they are faster, easier and flexible to train. Something which torchvision too does.
With this context, we can start adding YOLO v5.
Depedencies: -
opencv-python
at all costs. Opencv is not like PIL a library for image reading. It is huge and has lot of subdependencies. Keeping library light will enable us to usePyTorch
Docker containers and directly infer usingtorchserve
.Evaluation mode: -
.eval()
. We don't need
.fuse()such methods. We only need to load the model from
torch.hub.Currently, we do not have inference scripts for any models, but surely in future #2 . So right now let's focus on training.
The text was updated successfully, but these errors were encountered: