-
-
Notifications
You must be signed in to change notification settings - Fork 16.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Objects365.yaml to include the official validation set #5194
Conversation
Download the official Objects365 validation set and convert the labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 Hello @farleylai, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:
- ✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master an automatic GitHub actions rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature # <----- replace 'feature' with local branch name
git merge upstream/master
git push -u origin -f
- ✅ Verify all Continuous Integration (CI) checks are passing.
- ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee
|
@farleylai thanks for the PR! I've cleaned it up a bit without changing the functionality. I don't have the dataset downloaded currently, so I'm not sure what's normal, but 0.1M/1.6M seems like an excessive fraction of corrupted images. Typically datasets might have a few problem images that fall into this category, but usually these are <<1% of the total. I should mention the one time we downloaded objects365 before we had to restart the download script multiple times due to incomplete downloads. The curl commands should be retry-friendly, so they should recognize partially downloaded files and resume downloading where they left off. What do the error messages say on most of your corrupted images specifically? |
@farleylai PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐ |
I checked out the warning msgs in the cache and it all says something like "../datasets/Objects365/images/val/objects365_v2_01926665.jpg: non-normalized or out of bounds coordinate labels'." Take $> file ../datasets/Objects365/images/val/objects365_v2_01926665.jpg
../datasets/Objects365/images/val/objects365_v2_01926665.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 1024x748, frames 3 $> cat ../datasets/Objects365/labels/val/objects365_v2_01926665.txt
9 0.50000 0.80787 1.00000 0.38427
95 0.68296 0.13299 0.11006 0.13893
117 0.49981 0.58072 1.00039 0.54321
209 0.54939 0.10846 0.13147 0.17548 Then trace back to the original coco annotation showing the out of bounds coordinates as follows:
It is now clear those coordinates must be clamped before normalization in case. |
@farleylai ok got it. We want to clip these as xyxy labels then, and afterward convert to xywh. We do this for xView dataset as well using Line 78 in fc36064
|
The function only accept nx4 numpy arrays or torch tensors, so we probably want to do something like this: xyxy = np.array([x, y, x+w, y+h]).reshape(1,4) # pixels
xywhn_clipped = xyxy2xywhn(xyxy, w=image_width, h=image_height, clip=True) # normalized and clipped Lines 535 to 545 in fc36064
|
@farleylai I just downloaded the dataset over the last few hours on an EC2 instance. I seem to have downloaded everything successfully, though I can't tell if all of the patches succeeded in downloading and unzipping. Unfortunately we haven't really built in any overall report into this. My stats look a little larger than yours though, I have 1742289 training images, so it's likely you may be missing a few training patches. The download occupies about 700 GB on my hard drive (including undeleted zips). train: Scanning '../datasets/Objects365/labels/train' images and labels...1742289 found, 0 missing, 0 empty, 111771 corrupted: 100%|███████| 1742289/1742289 [09:21<00:00, 3101.89it/s]
val: Scanning '../datasets/Objects365/labels/val' images and labels...80000 found, 0 missing, 0 empty, 7312 corrupted: 100%|███████| 80000/80000 [00:06<00:00, 12564.88it/s] I also see a significant number of 'corrupted' messages due to duplicate labels. This is just two identical rows (same class, same exact coordinates). We handle these as errors rather than warnings, so the image will be rejected if any of these occur: train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00051762.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00051841.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00051849.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00051921.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00051929.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052000.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052013.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052022.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052046.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052076.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052150.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052158.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052161.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052200.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052205.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052225.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052235.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052255.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052318.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052417.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052420.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052444.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052472.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052486.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052490.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052528.jpg: duplicate labels |
I do not have high bandwidth to test the full download of the training set in a few hours. There could be some failed patches earlier on. See if I may reproduce the same numbers as yours soon. As for those rejected labels, it is indeed an issue and must be fixed before this dataset to be useful. Hopefully, the PR can be submitted in one day. |
@farleylai ok! Don't worry about the duplicate labels, I'll push a seperate PR to handle those better inside the dataset checks. I'll convert them from errors to warnings and automatically fix them I think. |
@farleylai duplicate labels are now handled automatically (warning + fixed automatically) and all other label errors feature improved introspection to point out to users the exact values causing the problem. See PR #5210, merged into master now. |
Glad those duplicate labels can be safely handled. |
@farleylai yeah it would be useful to capture and report download failures as part of download.py. It would also be much better if there was a way to print to screen more clearly when doing multithread downloads, that by itself would help to spot failed downloads. Right now the 8 threads print over each other and the resulting terminal window becomes unreadable. I don't know an easy fix, but I know it's possible, i.e. Docker does this when downloading from multiple sources. |
After those patches are finally downloaded in full, there seems like only one image difference from your download in the training set. However, after moving the jpg images to the same directory, the total training set image count becomes Per patch tarball images:Training set with 1742290 (1742289 distinct) images in total:$> find train -mindepth 1 -type f -name "patch*.tar.gz" -print0 | xargs -0 -I {} sh -c "echo {}'\t'\`tar -tzf {} | grep jpg | wc -l\`"
train/patch46.tar.gz 36000
train/patch40.tar.gz 34366
train/patch19.tar.gz 34369
train/patch2.tar.gz 34915
train/patch20.tar.gz 34337
train/patch10.tar.gz 34529
train/patch31.tar.gz 34314
train/patch18.tar.gz 34374
train/patch11.tar.gz 34053
train/patch36.tar.gz 34364
train/patch7.tar.gz 34565
train/patch50.tar.gz 34357
train/patch22.tar.gz 34354
train/patch35.tar.gz 34288
train/patch45.tar.gz 19437
train/patch0.tar.gz 34797
train/patch4.tar.gz 34520
train/patch32.tar.gz 34424
train/patch9.tar.gz 34758
train/patch17.tar.gz 34422
train/patch14.tar.gz 33156
train/patch38.tar.gz 34468
train/patch15.tar.gz 31477
train/patch21.tar.gz 34391
train/patch5.tar.gz 34468
train/patch28.tar.gz 34404
train/patch49.tar.gz 36000
train/patch39.tar.gz 34366
train/patch30.tar.gz 34450
train/patch23.tar.gz 34346
train/patch47.tar.gz 36000
train/patch37.tar.gz 34377
train/patch8.tar.gz 34555
train/patch48.tar.gz 36000
train/patch41.tar.gz 34341
train/patch43.tar.gz 34829
train/patch13.tar.gz 32938
train/patch27.tar.gz 34435
train/patch16.tar.gz 34341
train/patch42.tar.gz 34349
train/patch44.tar.gz 36040
train/patch25.tar.gz 34454
train/patch34.tar.gz 34310
train/patch12.tar.gz 32891
train/patch24.tar.gz 34410
train/patch26.tar.gz 34357
train/patch6.tar.gz 34497
train/patch1.tar.gz 34722
train/patch33.tar.gz 34444
train/patch29.tar.gz 34494
train/patch3.tar.gz 34437 Validation set with 8000 images in total:$> find val -mindepth 1 -type f -name "patch*.tar.gz" -print0 | xargs -0 -I {} sh -c "echo {}'\t'\`tar -tzf {} | grep jpg | wc -l\`"
val/patch40.tar.gz 1807
val/patch19.tar.gz 1851
val/patch2.tar.gz 1246
val/patch20.tar.gz 1778
val/patch10.tar.gz 1570
val/patch31.tar.gz 1887
val/patch18.tar.gz 1828
val/patch11.tar.gz 2186
val/patch36.tar.gz 1766
val/patch7.tar.gz 1641
val/patch22.tar.gz 1805
val/patch35.tar.gz 1847
val/patch0.tar.gz 1311
val/patch4.tar.gz 1595
val/patch32.tar.gz 1789
val/patch9.tar.gz 1501
val/patch17.tar.gz 1769
val/patch14.tar.gz 3115
val/patch38.tar.gz 1762
val/patch15.tar.gz 1038
val/patch21.tar.gz 1798
val/patch5.tar.gz 1679
val/patch28.tar.gz 1723
val/patch39.tar.gz 1761
val/patch30.tar.gz 1779
val/patch23.tar.gz 1826
val/patch37.tar.gz 1780
val/patch8.tar.gz 1705
val/patch41.tar.gz 1828
val/patch43.tar.gz 1293
val/patch13.tar.gz 3392
val/patch27.tar.gz 1740
val/patch16.tar.gz 1789
val/patch42.tar.gz 1840
val/patch25.tar.gz 1755
val/patch34.tar.gz 1889
val/patch12.tar.gz 3564
val/patch24.tar.gz 1748
val/patch26.tar.gz 1758
val/patch6.tar.gz 1747
val/patch1.tar.gz 1249
val/patch33.tar.gz 1785
val/patch29.tar.gz 1709
val/patch3.tar.gz 1771 |
@farleylai great! #5214 implements clipping we discussed. The dataset caches with zero corrupt images now. |
You're quick and I wish to have as fast computing resources for validation though. BTW, I can confirm those duplicate labels are indeed in the original annotations. Perhaps, it was caused when the creator was merging the annotations from multiple workers. Take (9, ['0.56549', '0.54617', '101.01782', '106.62991'])
(63, ['488.57837', '101.07703', '144.38501', '126.85852'])
(63, ['488.57837', '101.07703', '144.38501', '126.85852'])
(63, ['495.71875', '237.90430', '144.43140', '171.04572'])
(63, ['495.71875', '237.90430', '144.43140', '171.04572'])
(83, ['156.13245', '238.91760', '53.47083', '50.74274'])
(83, ['156.13245', '238.91760', '53.47083', '50.74274'])
(83, ['194.32593', '206.18036', '45.28650', '43.10406'])
(83, ['194.32593', '206.18036', '45.28650', '43.10406'])
(83, ['215.60510', '145.78528', '41.15546', '28.99585'])
(83, ['215.60510', '145.78528', '41.15546', '28.99585'])
(105, ['183.95911', '291.84283', '48.56024', '50.74280'])
(105, ['209.05768', '251.46692', '51.83398', '51.28833'])
(105, ['435.12170', '84.78058', '28.64856', '20.17505'])
(105, ['435.12170', '84.78058', '28.64856', '20.17505'])
(105, ['460.94580', '65.00900', '26.63110', '26.63107'])
(105, ['460.94580', '65.00900', '26.63110', '26.63107'])
(108, ['243.19794', '368.39868', '53.78265', '55.65332'])
(108, ['243.19794', '368.39868', '53.78265', '55.65332'])
(108, ['275.93524', '355.30377', '22.91608', '24.78680'])
(108, ['275.93524', '355.30377', '22.91608', '24.78680'])
(108, ['285.28870', '323.50183', '57.05640', '52.84735'])
(108, ['285.28870', '323.50183', '57.05640', '52.84735'])
(108, ['294.17456', '374.47845', '55.18567', '52.37964'])
(108, ['294.17456', '374.47845', '55.18567', '52.37964'])
(108, ['317.09064', '291.23230', '42.55841', '39.28467'])
(108, ['317.09064', '291.23230', '42.55841', '39.28467'])
(108, ['334.86230', '357.17450', '40.68774', '32.26953'])
(108, ['334.86230', '357.17450', '40.68774', '32.26953'])
(108, ['341.87744', '318.35742', '43.49377', '44.89685'])
(108, ['341.87744', '318.35742', '43.49377', '44.89685'])
(142, ['243.39313', '18.17514', '87.40271', '81.65799'])
(142, ['243.39313', '18.17514', '87.40271', '81.65799'])
(142, ['294.40613', '-1.30167', '40.78308', '28.20508'])
(142, ['294.40613', '-1.30167', '40.78308', '28.20508'])
(142, ['322.17871', '0.12009', '66.47534', '67.70639'])
(142, ['322.17871', '0.12009', '66.47534', '67.70639'])
(153, ['27.74731', '265.87366', '158.26715', '155.37903'])
(153, ['27.74731', '265.87366', '158.26715', '155.37903'])
(159, ['324.27277', '171.03973', '62.93573', '103.82391'])
(159, ['324.27277', '171.03973', '62.93573', '103.82391'])
(196, ['107.10449', '155.60645', '77.16638', '107.09766'])
(196, ['107.10449', '155.60645', '77.16638', '107.09766'])
(217, ['254.92322', '275.26453', '44.09509', '44.49591'])
(217, ['254.92322', '275.26453', '44.09509', '44.49591'])
(217, ['295.96729', '265.96484', '43.49384', '41.38922'])
(217, ['295.96729', '265.96484', '43.49384', '41.38922'])
(251, ['372.37653', '171.03973', '68.54779', '77.36688'])
(251, ['372.37653', '171.03973', '68.54779', '77.36688'])
(251, ['451.74768', '193.08728', '96.60828', '67.74609'])
(251, ['451.74768', '193.08728', '96.60828', '67.74609'])
(262, ['127.67511', '96.63177', '102.23822', '75.09027'])
(262, ['127.67511', '96.63177', '102.23822', '75.09027'])
(265, ['0.67590', '91.43323', '138.55151', '333.86279'])
(289, ['158.08105', '168.23370', '75.76337', '75.76337'])
(289, ['158.08105', '168.23370', '75.76337', '75.76337'])
(289, ['230.10303', '156.07416', '50.50891', '71.55432'])
(289, ['230.10303', '156.07416', '50.50891', '71.55432'])
(297, ['45.11407', '289.19269', '89.52289', '112.98071'])
(297, ['64.12659', '290.98798', '123.71802', '120.64038']) |
A recent Copy-Paste augmentation combined with self-training on Objects365 seemingly boosting the COCO performance by 1.5% without TTA may deserve a look: https://arxiv.org/abs/2012.07177 |
@farleylai yes, the data will always have issues, so the best thing to do is fix whats's fixable and ignore (but notify user about) problem images/labels. Though I'm also surprised that an organization would expend the resources to label almost 2 million images and not do basic cleaning and checking of their data. |
@farleylai I trained a YOLOv5m model on Objects365 following this PR and the other related fixes. Everything works well. [email protected]:0.95 was only 18.5 after 30 epochs, but person mAP was similar to COCO, about 55 [email protected]:0.95. I'm sure this could also be improved with more epochs and additional tweaks, but at first glance all is good here. DDP train command:
Results # YOLOv5m v6.0 COCO 300 epochs
all 5000 36335 0.726 0.569 0.633 0.439
person 5000 10777 0.792 0.735 0.807 0.554
# YOLOv5m v6.0 Objects365 30 epochs
all 80000 1239576 0.626 0.265 0.273 0.185
Person 80000 80332 0.599 0.765 0.759 0.57 |
Looks very promising and somewhat manageable compared with OpenImages. |
@farleylai trained model uploaded to https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5m_Objects365.pt. This is just a first time training, so I'm sure there's room for improvement in the results. Yes I agree, I like this dataset. It's got more classes and more images than COCO. Here are YOLOv5m detections for both. With Objects365 you get additional useful categories, i.e. shoes, sunglasses
|
While COCO is widely used for benchmarking, the limited number of classes does not help much detect rich contextual objects other than persons in diverse real applications. Though transfer from Objects365 to COCO is likely to improve the benchmark results, the other way around from COCO to Objects365 could be more useful in practice. Before that, I think a well-tuned baseline would be necessary and the results should be at least or even much better than v1. |
@farleylai yes good points! |
Thank you for your wonderful work. |
@ahong007007 yes we used an AWS P4d instance with 8 A100s with DPP for Objects365 training. For 30 epochs of YOLOv5m it was pretty fast, about 1.5 days. Training command in #5194 (comment) |
…ytics#5194) * Update Objects365.yaml Download the official Objects365 validation set and convert the labels * Enforce 4-space indent, reformat and cleanup * shorten list comprehension Co-authored-by: Glenn Jocher <[email protected]>
According to https://docs.ultralytics.com/yolov5/tutorials/multi_gpu_training/#faq. Thanks! @glenn-jocher |
@sibozhang training using DDP with multiple GPUs can be done using the
This command will distribute the training across the specified GPUs. Adjust the batch size, number of epochs, and other parameters as desired. Good luck with your training! |
[E ProcessGroupNCCL.cpp:828] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1808113 milliseconds before timing out. I cannot start to train Objects365 on A100x8 40GB since NCCL timeout. I also tried add cache
How to change NCCL timeout settings? @glenn-jocher |
@sibozhang it looks like you're encountering NCCL timeout issues during training on A100x8 40GB. You can try adjusting the NCCL timeout settings using the If the issue persists, please refer to the official documentation for NVIDIA NCCL or reach out to the NVIDIA support channels for further assistance. Good luck with your training! |
@glenn-jocher Are there plans to release new versions of Objects365 models for new versions of Yolo? I have made a room classifier https://github.com/p2p-sys/yolo5-classificator using COCO and Objects365 models, I would like to use the new versions of Yolo. Unfortunately Objects365 could not be trained on its own, in the process of training the programme is killed by the system |
Hello @p2p-sys! We're always working on improving and updating our models, including those trained on different datasets like Objects365. Keep an eye on our GitHub releases for any updates on new versions of YOLO trained with Objects365. Regarding the issue with training being killed, it might be related to system resource limitations. Ensure you have sufficient memory and processing power, or consider reducing the batch size or using a simpler model. If the problem persists, please open an issue with detailed logs and system specs for further assistance. Thank you for using YOLOv5 for your room classifier project! 🚀 |
Include the official Objects365 validation set to download and convert the labels
🛠️ PR Summary
Made with ❤️ by Ultralytics Actions
🌟 Summary
Enhanced
Objects365.yaml
to support both training and validation data splits.📊 Key Changes
🎯 Purpose & Impact
yolov5
repository can now expect more streamlined download and setup processes for using the Objects365 dataset, potentially leading to more robust model training and validation.