Any plan for adding oversampling function for imbalanced dataset? #1115

luvwinnie · 2020-10-11T10:41:48Z

❔Question

I think it is often have a imbalanced dataset in real scenario, does this repo tested on the technique of resampling such as oversampling, undersampling or SMOTE etc?

github-actions · 2020-10-11T10:42:35Z

Hello @luvwinnie, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

glenn-jocher · 2020-10-11T10:54:05Z

@luvwinnie I'm not familiar with SMOTE. We tested a few adaptions to class imbalances that helped in early training, but these overfit faster as well and resulted in lower final maps, so our current implementation has no specific class imbalance adaptations in place when using default settings.

In any case, COCO and VOC suffer from severe class imbalances, and these train very well with the default settings.

glenn-jocher · 2020-10-11T10:55:39Z

@luvwinnie ah, BTW, one technique to address custom datasets, including class imabalances, is simply to evolve hyperparameters, which include loss balancers and BCE positive weights for class and conf. See https://docs.ultralytics.com/yolov5

luvwinnie · 2020-10-11T10:58:18Z

@glenn-jocher Thank you for reply! I would like to test the evolve hyperparameters and SMOTE later. Just one thing is that seems like the evolve hyperparameters take very long , is the evolve hyperparameters training can be resume later with any options?

glenn-jocher · 2020-10-11T11:06:28Z

@luvwinnie yes, evolution is an expensive habit, as you basically want to train 300 times or so. It can be stopped and resumed from the same evolve.txt, and you can also deploy multiple gpus (to evolve in parallel to a single evolve.txt) and multiple nodes/VMs (to evolve from a central cloud based evolve.txt).

luvwinnie · 2020-10-11T11:40:35Z

@glenn-jocher Thank you! i try with python -m torch.distributed.launch --nproc_per_node 2 train.py seems like it show the following errors. AssertionError: DDP mode not implemented for --evolve the multiple GPU cannot be use for --evolve?

glenn-jocher · 2020-10-11T12:52:51Z

@luvwinnie evolving multi-GPU is done with one GPU per process. I've updated the hyperparameter evolution tutorial https://docs.ultralytics.com/yolov5/tutorials/hyperparameter_evolution with an example.

# Single-GPU
python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache --evolve

# Multi-GPU
for i in 0 1 2 3; do
  python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache --evolve --device $i
done

EDIT: this shows a bash for loop, but in practice you'd want to run these in detached screens or simply in new terminal windows, one screen/window per cuda device, with --device 0, --device 1, etc.

luvwinnie · 2020-10-11T13:03:17Z

@glenn-jocher thank you for reply! I think suppose better with the following with nohup and background task?

#!/bin/bash
### EDITIED ###
for i in 0 1 2 3; do
  nohup python train.py --epochs 10 --data coco128.yaml --weights yolov5s.pt --cache --evolve --device $i > evolve_gpu_$i.log &
done

glenn-jocher · 2020-10-11T13:08:06Z

@luvwinnie ah, yes that's the effect I was going for. When running docker images in detached -d mode you can use a simple for loop as I had initially, but for running directly you need to detach the command somehow. In the past I used screens for this: https://superuser.com/questions/454907/how-to-execute-a-command-in-screen-and-detach

Does this nohup command do the same? Is the & character at the end required to complete the nohup command?

luvwinnie · 2020-10-11T13:12:53Z

@glenn-jocher yes. For my environment, I didn't use docker to detach the session. To run as background process we need the & character, nohup command is use to make sure the process keeps running even the terminal session closed.

EDIT: I have updated the code to log out the print in order to use the tail -f command to check the progress

glenn-jocher · 2020-10-11T13:18:33Z

@luvwinnie very cool! I've updated the tutorial with your nohup command.

How would you check in on a thread to see it's logging (to make sure it hasn't crashed etc.)?

luvwinnie · 2020-10-11T13:22:45Z

@glenn-jocher Thank you! Normally I will just redirect the stdout to some log file, as I edited the command I redirected all the stdout in a separate log file, which names as evolve_gpu_$i.log. Let say I have 2 GPU, I will just use the following command to check the progress. Although it is not clean @@;

terminal 1:$ tail -f evolve_gpu_0.log
terminal 2:$ tail -f evolve_gpu_1.log

Samjith888 · 2020-10-16T17:31:00Z

@luvwinnie yes, evolution is an expensive habit, as you basically want to train 300 times or so. It can be stopped and resumed from the same evolve.txt, and you can also deploy multiple gpus (to evolve in parallel to a single evolve.txt) and multiple nodes/VMs (to evolve from a central cloud based evolve.txt).

Hyperparameter evolution stopped at 30th step , how to resume this from the 30th step ?

glenn-jocher · 2020-10-16T19:36:25Z

@Samjith888 you just rerun your same --evolve command (don't use --resume, that's only for normal training).

If an evolve.txt file already exists in your yolov5 directory, it resumes from there.

luvwinnie · 2020-10-20T06:27:36Z

@glenn-jocher I have ran the evolve and currently 287 generation, it shows the below results. Does this means it doesn't help much with evolve? From my understanding I think data/hyp.finetune.yaml is one of the yolov5m evolved results. which means the metrics shows as same as data/hyp.finetune.yaml am I wrong?Or the data/hyp.finetune.yaml is the final result of trained with evolve config?

Hyperparameter Evolution Results
Generations: 287
Metrics: 0.421 0.681 0.604 0.366 0.0192 0.0101 0.00248

glenn-jocher · 2020-10-20T11:47:40Z

@luvwinnie ok great. These are your evolved metrics. You can paste the labels from finetune.yaml, i.e.

#                   P         R     mAP.5 mAP.5:.95       box       obj       cls

You should compare these to your baseline results you had before you started evolving.

luvwinnie · 2020-10-20T11:56:08Z

@glenn-jocher Thank you for reply. I have a baseline trained without evolve from pretrained yolov5s.pt, Do you mean I should compare the baseline with the result that train with this command?

python -u train.py --img=640 --batch=16 --epochs=300  --data=dataset/custom.yaml --cfg=models/yolov5s.yaml --weights=weights/yolov5s.pt --hyp hyp_evolved.yaml

glenn-jocher · 2020-10-20T12:29:43Z

@luvwinnie sure. All you do is point the same baseline command to your new hyp_evolved.yaml to get your updated results. You compare 3 to 1 obviously.

train.py ...
train.py ... --evolve
train.py ... --hyp hyp_evolved.yaml

github-actions · 2020-11-20T00:34:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

luvwinnie added the question Further information is requested label Oct 11, 2020

NazmulTakbir mentioned this issue Nov 4, 2020

Is this seemingly faster way of hyperparameter evolution going to be effective? #1282

Closed

github-actions bot added the Stale Stale and schedule for closing soon label Nov 20, 2020

github-actions bot closed this as completed Nov 25, 2020

pourmand1376 mentioned this issue Jul 28, 2022

Add Weighted Sampler for highly imbalanced datasets #8766

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plan for adding oversampling function for imbalanced dataset? #1115

Any plan for adding oversampling function for imbalanced dataset? #1115

luvwinnie commented Oct 11, 2020

github-actions bot commented Oct 11, 2020 •

edited by glenn-jocher

Loading

glenn-jocher commented Oct 11, 2020

glenn-jocher commented Oct 11, 2020 •

edited

Loading

luvwinnie commented Oct 11, 2020

glenn-jocher commented Oct 11, 2020

luvwinnie commented Oct 11, 2020

glenn-jocher commented Oct 11, 2020 •

edited

Loading

luvwinnie commented Oct 11, 2020 •

edited

Loading

glenn-jocher commented Oct 11, 2020

luvwinnie commented Oct 11, 2020 •

edited

Loading

glenn-jocher commented Oct 11, 2020

luvwinnie commented Oct 11, 2020 •

edited

Loading

Samjith888 commented Oct 16, 2020

glenn-jocher commented Oct 16, 2020

luvwinnie commented Oct 20, 2020 •

edited

Loading

glenn-jocher commented Oct 20, 2020

luvwinnie commented Oct 20, 2020

glenn-jocher commented Oct 20, 2020 •

edited

Loading

github-actions bot commented Nov 20, 2020

Any plan for adding oversampling function for imbalanced dataset? #1115

Any plan for adding oversampling function for imbalanced dataset? #1115

Comments

luvwinnie commented Oct 11, 2020

❔Question

github-actions bot commented Oct 11, 2020 • edited by glenn-jocher Loading

glenn-jocher commented Oct 11, 2020

glenn-jocher commented Oct 11, 2020 • edited Loading

luvwinnie commented Oct 11, 2020

glenn-jocher commented Oct 11, 2020

luvwinnie commented Oct 11, 2020

glenn-jocher commented Oct 11, 2020 • edited Loading

luvwinnie commented Oct 11, 2020 • edited Loading

glenn-jocher commented Oct 11, 2020

luvwinnie commented Oct 11, 2020 • edited Loading

glenn-jocher commented Oct 11, 2020

luvwinnie commented Oct 11, 2020 • edited Loading

Samjith888 commented Oct 16, 2020

glenn-jocher commented Oct 16, 2020

luvwinnie commented Oct 20, 2020 • edited Loading

glenn-jocher commented Oct 20, 2020

luvwinnie commented Oct 20, 2020

glenn-jocher commented Oct 20, 2020 • edited Loading

github-actions bot commented Nov 20, 2020

github-actions bot commented Oct 11, 2020 •

edited by glenn-jocher

Loading

glenn-jocher commented Oct 11, 2020 •

edited

Loading

glenn-jocher commented Oct 11, 2020 •

edited

Loading

luvwinnie commented Oct 11, 2020 •

edited

Loading

luvwinnie commented Oct 11, 2020 •

edited

Loading

luvwinnie commented Oct 11, 2020 •

edited

Loading

luvwinnie commented Oct 20, 2020 •

edited

Loading

glenn-jocher commented Oct 20, 2020 •

edited

Loading