-
-
Notifications
You must be signed in to change notification settings - Fork 16.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any plan for adding oversampling function for imbalanced dataset? #1115
Comments
Hello @luvwinnie, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments. If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you. If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com. |
@luvwinnie I'm not familiar with SMOTE. We tested a few adaptions to class imbalances that helped in early training, but these overfit faster as well and resulted in lower final maps, so our current implementation has no specific class imbalance adaptations in place when using default settings. In any case, COCO and VOC suffer from severe class imbalances, and these train very well with the default settings. |
@luvwinnie ah, BTW, one technique to address custom datasets, including class imabalances, is simply to evolve hyperparameters, which include loss balancers and BCE positive weights for class and conf. See https://docs.ultralytics.com/yolov5 |
@glenn-jocher Thank you for reply! I would like to test the evolve hyperparameters and SMOTE later. Just one thing is that seems like the evolve hyperparameters take very long , is the evolve hyperparameters training can be resume later with any options? |
@luvwinnie yes, evolution is an expensive habit, as you basically want to train 300 times or so. It can be stopped and resumed from the same evolve.txt, and you can also deploy multiple gpus (to evolve in parallel to a single evolve.txt) and multiple nodes/VMs (to evolve from a central cloud based evolve.txt). |
@glenn-jocher Thank you! i try with python -m torch.distributed.launch --nproc_per_node 2 train.py seems like it show the following errors. |
@luvwinnie evolving multi-GPU is done with one GPU per process. I've updated the hyperparameter evolution tutorial https://docs.ultralytics.com/yolov5/tutorials/hyperparameter_evolution with an example. # Single-GPU
python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache --evolve
# Multi-GPU
for i in 0 1 2 3; do
python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache --evolve --device $i
done EDIT: this shows a bash for loop, but in practice you'd want to run these in detached screens or simply in new terminal windows, one screen/window per cuda device, with --device 0, --device 1, etc. |
@glenn-jocher thank you for reply! I think suppose better with the following with nohup and background task? #!/bin/bash
### EDITIED ###
for i in 0 1 2 3; do
nohup python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache --evolve --device $i > evolve_gpu_$i.log &
done |
@luvwinnie ah, yes that's the effect I was going for. When running docker images in detached -d mode you can use a simple for loop as I had initially, but for running directly you need to detach the command somehow. In the past I used screens for this: https://superuser.com/questions/454907/how-to-execute-a-command-in-screen-and-detach Does this nohup command do the same? Is the |
@glenn-jocher yes. For my environment, I didn't use docker to detach the session. To run as background process we need the EDIT: I have updated the code to log out the print in order to use the tail -f command to check the progress |
@luvwinnie very cool! I've updated the tutorial with your nohup command. How would you check in on a thread to see it's logging (to make sure it hasn't crashed etc.)? |
@glenn-jocher Thank you! Normally I will just redirect the stdout to some log file, as I edited the command I redirected all the stdout in a separate log file, which names as evolve_gpu_$i.log. Let say I have 2 GPU, I will just use the following command to check the progress. Although it is not clean @@; terminal 1:$ tail -f evolve_gpu_0.log
terminal 2:$ tail -f evolve_gpu_1.log |
Hyperparameter evolution stopped at 30th step , how to resume this from the 30th step ? |
@Samjith888 you just rerun your same --evolve command (don't use --resume, that's only for normal training). If an evolve.txt file already exists in your yolov5 directory, it resumes from there. |
@glenn-jocher I have ran the evolve and currently 287 generation, it shows the below results. Does this means it doesn't help much with evolve? From my understanding I think Hyperparameter Evolution Results |
@luvwinnie ok great. These are your evolved metrics. You can paste the labels from finetune.yaml, i.e. # P R mAP.5 mAP.5:.95 box obj cls You should compare these to your baseline results you had before you started evolving. |
@glenn-jocher Thank you for reply. I have a baseline trained without evolve from pretrained yolov5s.pt, Do you mean I should compare the baseline with the result that train with this command?
|
@luvwinnie sure. All you do is point the same baseline command to your new hyp_evolved.yaml to get your updated results. You compare 3 to 1 obviously.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
❔Question
I think it is often have a imbalanced dataset in real scenario, does this repo tested on the technique of resampling such as oversampling, undersampling or SMOTE etc?
The text was updated successfully, but these errors were encountered: