Update Objects365.yaml to include the official validation set #5194

farleylai · 2021-10-15T00:37:58Z

Include the official Objects365 validation set to download and convert the labels

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Enhanced Objects365.yaml to support both training and validation data splits.

📊 Key Changes

🔄 Implemented separate code paths for handling 'train' and 'val' data splits.
➕ Added download links and processes for the validation dataset.
⬇️ Automated downloading of annotation files and images.
🚚 Streamlined moving of images to their corresponding directory after download.
📝 Adapted label generation code to work with both training and validation images.

🎯 Purpose & Impact

🔍 The changes allow easier access to both training and validation datasets within the Objects365 dataset, facilitating proper machine learning model evaluation.
🤖 Users of the yolov5 repository can now expect more streamlined download and setup processes for using the Objects365 dataset, potentially leading to more robust model training and validation.
👌 This update simplifies the usability of the dataset setup scripts, potentially increasing adoption and improving user experience.

Download the official Objects365 validation set and convert the labels

github-actions

👋 Hello @farleylai, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master an automatic GitHub actions rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:

git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git merge upstream/master
git push -u origin -f

✅ Verify all Continuous Integration (CI) checks are passing.
✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

farleylai · 2021-10-15T01:31:06Z

dataset_stats shows the scanning stats.
Are those corrupted instances expected and the dataset/labels remain useful?

Scanning '../datasets/Objects365/labels/train' images and labels...1615704 found, 0 missing, 0 empty, 102543 corrupted: 100%|█| 1615712/1615712 [47:59<00:00, 56
Scanning '../datasets/Objects365/labels/val' images and labels...80000 found, 0 missing, 0 empty, 7312 corrupted: 100%|██| 80000/80000 [02:56<00:00, 452.57it/s]

glenn-jocher · 2021-10-15T03:06:20Z

@farleylai thanks for the PR! I've cleaned it up a bit without changing the functionality.

I don't have the dataset downloaded currently, so I'm not sure what's normal, but 0.1M/1.6M seems like an excessive fraction of corrupted images. Typically datasets might have a few problem images that fall into this category, but usually these are <<1% of the total.

I should mention the one time we downloaded objects365 before we had to restart the download script multiple times due to incomplete downloads. The curl commands should be retry-friendly, so they should recognize partially downloaded files and resume downloading where they left off.

What do the error messages say on most of your corrupted images specifically?

glenn-jocher · 2021-10-15T03:49:00Z

@farleylai PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐

farleylai · 2021-10-15T05:19:36Z

I checked out the warning msgs in the cache and it all says something like "../datasets/Objects365/images/val/objects365_v2_01926665.jpg: non-normalized or out of bounds coordinate labels'."

Take objects365_v2_01926665.jpg for example.
The image info and labels are as follows:

$> file ../datasets/Objects365/images/val/objects365_v2_01926665.jpg
../datasets/Objects365/images/val/objects365_v2_01926665.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 1024x748, frames 3

$> cat ../datasets/Objects365/labels/val/objects365_v2_01926665.txt
9 0.50000 0.80787 1.00000 0.38427
95 0.68296 0.13299 0.11006 0.13893
117 0.49981 0.58072 1.00039 0.54321
209 0.54939 0.10846 0.13147 0.17548

Then trace back to the original coco annotation showing the out of bounds coordinates as follows:

[{'id': 24272310, 'iscrowd': 1, 'isfake': 0, 'area': 416239.1306653306, 'isreflected': 0, 'bbox': [-0.3929443328, 231.2183837524, 1024.4027099136, 406.32373053800006], 'image_id': 1926665, 'category_id': 118}]

It is now clear those coordinates must be clamped before normalization in case.
I will try to make the next PR fix.

glenn-jocher · 2021-10-15T07:14:59Z

@farleylai ok got it. We want to clip these as xyxy labels then, and afterward convert to xywh. We do this for xView dataset as well using xyxy2xywhn(..., clip=True) (n stands for normalized), otherwise the same issue happens there with slight out of bounds coordinates causing errors:

yolov5/data/xView.yaml

Line 78 in fc36064

    
                                 box = xyxy2xywhn(box[None].astype(np.float), w=shapes[id][0], h=shapes[id][1], clip=True)

glenn-jocher · 2021-10-15T07:22:14Z

The function only accept nx4 numpy arrays or torch tensors, so we probably want to do something like this:

xyxy = np.array([x, y, x+w, y+h]).reshape(1,4)  # pixels
xywhn_clipped = xyxy2xywhn(xyxy, w=image_width, h=image_height, clip=True)  # normalized and clipped

yolov5/utils/general.py

Lines 535 to 545 in fc36064

    
           def xyxy2xywhn(x, w=640, h=640, clip=False, eps=0.0): 
        
               # Convert nx4 boxes from [x1, y1, x2, y2] to [x, y, w, h] normalized where xy1=top-left, xy2=bottom-right 
        
               if clip: 
        
                   clip_coords(x, (h - eps, w - eps))  # warning: inplace clip 
        
               y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x) 
        
               y[:, 0] = ((x[:, 0] + x[:, 2]) / 2) / w  # x center 
        
               y[:, 1] = ((x[:, 1] + x[:, 3]) / 2) / h  # y center 
        
               y[:, 2] = (x[:, 2] - x[:, 0]) / w  # width 
        
               y[:, 3] = (x[:, 3] - x[:, 1]) / h  # height 
        
               return y

glenn-jocher · 2021-10-15T08:02:35Z

@farleylai I just downloaded the dataset over the last few hours on an EC2 instance. I seem to have downloaded everything successfully, though I can't tell if all of the patches succeeded in downloading and unzipping. Unfortunately we haven't really built in any overall report into this. My stats look a little larger than yours though, I have 1742289 training images, so it's likely you may be missing a few training patches. The download occupies about 700 GB on my hard drive (including undeleted zips).

train: Scanning '../datasets/Objects365/labels/train' images and labels...1742289 found, 0 missing, 0 empty, 111771 corrupted: 100%|███████| 1742289/1742289 [09:21<00:00, 3101.89it/s]
val: Scanning '../datasets/Objects365/labels/val' images and labels...80000 found, 0 missing, 0 empty, 7312 corrupted: 100%|███████| 80000/80000 [00:06<00:00, 12564.88it/s]

I also see a significant number of 'corrupted' messages due to duplicate labels. This is just two identical rows (same class, same exact coordinates). We handle these as errors rather than warnings, so the image will be rejected if any of these occur:

train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00051762.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00051841.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00051849.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00051921.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00051929.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052000.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052013.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052022.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052046.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052076.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052150.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052158.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052161.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052200.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052205.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052225.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052235.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052255.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052318.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052417.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052420.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052444.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052472.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052486.jpg: non-normalized or out of bounds coordinate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052490.jpg: duplicate labels
train: WARNING: Ignoring corrupted image and/or label ../datasets/Objects365/images/train/objects365_v1_00052528.jpg: duplicate labels

glenn-jocher · 2021-10-15T08:11:45Z

Anecdotally everything seems fine, boxes appear correct. That's very interesting that the class indices seem to be ordered by frequency, I've never seen that before.

farleylai · 2021-10-15T08:14:27Z

I do not have high bandwidth to test the full download of the training set in a few hours. There could be some failed patches earlier on. See if I may reproduce the same numbers as yours soon.

As for those rejected labels, it is indeed an issue and must be fixed before this dataset to be useful. Hopefully, the PR can be submitted in one day.

glenn-jocher · 2021-10-15T16:52:22Z

@farleylai ok! Don't worry about the duplicate labels, I'll push a seperate PR to handle those better inside the dataset checks. I'll convert them from errors to warnings and automatically fix them I think.

glenn-jocher · 2021-10-15T19:33:05Z

@farleylai duplicate labels are now handled automatically (warning + fixed automatically) and all other label errors feature improved introspection to point out to users the exact values causing the problem. See PR #5210, merged into master now.

farleylai · 2021-10-15T20:40:44Z

Glad those duplicate labels can be safely handled.
I also confirmed several incomplete downloaded patch tarballs were identified by tar -tzf and those can be resumed with curl -L -O -C - . Nonetheless, given the download(..) already includes --retry 9 -C -, the failure may be attributed to temporary network instability. Perhaps, those incomplete download events should be recorded and reported for a later retry/recovery.

glenn-jocher · 2021-10-15T22:59:33Z

@farleylai yeah it would be useful to capture and report download failures as part of download.py. It would also be much better if there was a way to print to screen more clearly when doing multithread downloads, that by itself would help to spot failed downloads. Right now the 8 threads print over each other and the resulting terminal window becomes unreadable. I don't know an easy fix, but I know it's possible, i.e. Docker does this when downloading from multiple sources.

farleylai · 2021-10-15T23:05:31Z

After those patches are finally downloaded in full, there seems like only one image difference from your download in the training set. However, after moving the jpg images to the same directory, the total training set image count becomes 1742289 that could imply an duplicate image stored in one the archives. This can be a bit nontrivial to identify.

Per patch tarball images:

Training set with 1742290 (1742289 distinct) images in total:

$> find train -mindepth 1 -type f -name "patch*.tar.gz" -print0 | xargs -0 -I {} sh -c "echo {}'\t'\`tar -tzf {} | grep jpg | wc -l\`"
train/patch46.tar.gz	36000
train/patch40.tar.gz	34366
train/patch19.tar.gz	34369
train/patch2.tar.gz	34915
train/patch20.tar.gz	34337
train/patch10.tar.gz	34529
train/patch31.tar.gz	34314
train/patch18.tar.gz	34374
train/patch11.tar.gz	34053
train/patch36.tar.gz	34364
train/patch7.tar.gz	34565
train/patch50.tar.gz	34357
train/patch22.tar.gz	34354
train/patch35.tar.gz	34288
train/patch45.tar.gz	19437
train/patch0.tar.gz	34797
train/patch4.tar.gz	34520
train/patch32.tar.gz	34424
train/patch9.tar.gz	34758
train/patch17.tar.gz	34422
train/patch14.tar.gz	33156
train/patch38.tar.gz	34468
train/patch15.tar.gz	31477
train/patch21.tar.gz	34391
train/patch5.tar.gz	34468
train/patch28.tar.gz	34404
train/patch49.tar.gz	36000
train/patch39.tar.gz	34366
train/patch30.tar.gz	34450
train/patch23.tar.gz	34346
train/patch47.tar.gz	36000
train/patch37.tar.gz	34377
train/patch8.tar.gz	34555
train/patch48.tar.gz	36000
train/patch41.tar.gz	34341
train/patch43.tar.gz	34829
train/patch13.tar.gz	32938
train/patch27.tar.gz	34435
train/patch16.tar.gz	34341
train/patch42.tar.gz	34349
train/patch44.tar.gz	36040
train/patch25.tar.gz	34454
train/patch34.tar.gz	34310
train/patch12.tar.gz	32891
train/patch24.tar.gz	34410
train/patch26.tar.gz	34357
train/patch6.tar.gz	34497
train/patch1.tar.gz	34722
train/patch33.tar.gz	34444
train/patch29.tar.gz	34494
train/patch3.tar.gz	34437

Validation set with 8000 images in total:

$> find val -mindepth 1 -type f -name "patch*.tar.gz" -print0 | xargs -0 -I {} sh -c "echo {}'\t'\`tar -tzf {} | grep jpg | wc -l\`"
val/patch40.tar.gz	1807
val/patch19.tar.gz	1851
val/patch2.tar.gz	1246
val/patch20.tar.gz	1778
val/patch10.tar.gz	1570
val/patch31.tar.gz	1887
val/patch18.tar.gz	1828
val/patch11.tar.gz	2186
val/patch36.tar.gz	1766
val/patch7.tar.gz	1641
val/patch22.tar.gz	1805
val/patch35.tar.gz	1847
val/patch0.tar.gz	1311
val/patch4.tar.gz	1595
val/patch32.tar.gz	1789
val/patch9.tar.gz	1501
val/patch17.tar.gz	1769
val/patch14.tar.gz	3115
val/patch38.tar.gz	1762
val/patch15.tar.gz	1038
val/patch21.tar.gz	1798
val/patch5.tar.gz	1679
val/patch28.tar.gz	1723
val/patch39.tar.gz	1761
val/patch30.tar.gz	1779
val/patch23.tar.gz	1826
val/patch37.tar.gz	1780
val/patch8.tar.gz	1705
val/patch41.tar.gz	1828
val/patch43.tar.gz	1293
val/patch13.tar.gz	3392
val/patch27.tar.gz	1740
val/patch16.tar.gz	1789
val/patch42.tar.gz	1840
val/patch25.tar.gz	1755
val/patch34.tar.gz	1889
val/patch12.tar.gz	3564
val/patch24.tar.gz	1748
val/patch26.tar.gz	1758
val/patch6.tar.gz	1747
val/patch1.tar.gz	1249
val/patch33.tar.gz	1785
val/patch29.tar.gz	1709
val/patch3.tar.gz	1771

glenn-jocher · 2021-10-16T06:21:53Z

@farleylai great! #5214 implements clipping we discussed. The dataset caches with zero corrupt images now.

farleylai · 2021-10-16T23:04:41Z

You're quick and I wish to have as fast computing resources for validation though.
Glad Objects356 is finally usable and worth training.

BTW, I can confirm those duplicate labels are indeed in the original annotations. Perhaps, it was caused when the creator was merging the annotations from multiple workers. Take WARNING: ../datasets/Objects365/images/val/objects365_v1_00669134.jpg: 27 duplicate labels removed for example. The original annotations sorted in the order of category ids and bboxes have almost every entry duplicate:

(9, ['0.56549', '0.54617', '101.01782', '106.62991'])
(63, ['488.57837', '101.07703', '144.38501', '126.85852'])
(63, ['488.57837', '101.07703', '144.38501', '126.85852'])
(63, ['495.71875', '237.90430', '144.43140', '171.04572'])
(63, ['495.71875', '237.90430', '144.43140', '171.04572'])
(83, ['156.13245', '238.91760', '53.47083', '50.74274'])
(83, ['156.13245', '238.91760', '53.47083', '50.74274'])
(83, ['194.32593', '206.18036', '45.28650', '43.10406'])
(83, ['194.32593', '206.18036', '45.28650', '43.10406'])
(83, ['215.60510', '145.78528', '41.15546', '28.99585'])
(83, ['215.60510', '145.78528', '41.15546', '28.99585'])
(105, ['183.95911', '291.84283', '48.56024', '50.74280'])
(105, ['209.05768', '251.46692', '51.83398', '51.28833'])
(105, ['435.12170', '84.78058', '28.64856', '20.17505'])
(105, ['435.12170', '84.78058', '28.64856', '20.17505'])
(105, ['460.94580', '65.00900', '26.63110', '26.63107'])
(105, ['460.94580', '65.00900', '26.63110', '26.63107'])
(108, ['243.19794', '368.39868', '53.78265', '55.65332'])
(108, ['243.19794', '368.39868', '53.78265', '55.65332'])
(108, ['275.93524', '355.30377', '22.91608', '24.78680'])
(108, ['275.93524', '355.30377', '22.91608', '24.78680'])
(108, ['285.28870', '323.50183', '57.05640', '52.84735'])
(108, ['285.28870', '323.50183', '57.05640', '52.84735'])
(108, ['294.17456', '374.47845', '55.18567', '52.37964'])
(108, ['294.17456', '374.47845', '55.18567', '52.37964'])
(108, ['317.09064', '291.23230', '42.55841', '39.28467'])
(108, ['317.09064', '291.23230', '42.55841', '39.28467'])
(108, ['334.86230', '357.17450', '40.68774', '32.26953'])
(108, ['334.86230', '357.17450', '40.68774', '32.26953'])
(108, ['341.87744', '318.35742', '43.49377', '44.89685'])
(108, ['341.87744', '318.35742', '43.49377', '44.89685'])
(142, ['243.39313', '18.17514', '87.40271', '81.65799'])
(142, ['243.39313', '18.17514', '87.40271', '81.65799'])
(142, ['294.40613', '-1.30167', '40.78308', '28.20508'])
(142, ['294.40613', '-1.30167', '40.78308', '28.20508'])
(142, ['322.17871', '0.12009', '66.47534', '67.70639'])
(142, ['322.17871', '0.12009', '66.47534', '67.70639'])
(153, ['27.74731', '265.87366', '158.26715', '155.37903'])
(153, ['27.74731', '265.87366', '158.26715', '155.37903'])
(159, ['324.27277', '171.03973', '62.93573', '103.82391'])
(159, ['324.27277', '171.03973', '62.93573', '103.82391'])
(196, ['107.10449', '155.60645', '77.16638', '107.09766'])
(196, ['107.10449', '155.60645', '77.16638', '107.09766'])
(217, ['254.92322', '275.26453', '44.09509', '44.49591'])
(217, ['254.92322', '275.26453', '44.09509', '44.49591'])
(217, ['295.96729', '265.96484', '43.49384', '41.38922'])
(217, ['295.96729', '265.96484', '43.49384', '41.38922'])
(251, ['372.37653', '171.03973', '68.54779', '77.36688'])
(251, ['372.37653', '171.03973', '68.54779', '77.36688'])
(251, ['451.74768', '193.08728', '96.60828', '67.74609'])
(251, ['451.74768', '193.08728', '96.60828', '67.74609'])
(262, ['127.67511', '96.63177', '102.23822', '75.09027'])
(262, ['127.67511', '96.63177', '102.23822', '75.09027'])
(265, ['0.67590', '91.43323', '138.55151', '333.86279'])
(289, ['158.08105', '168.23370', '75.76337', '75.76337'])
(289, ['158.08105', '168.23370', '75.76337', '75.76337'])
(289, ['230.10303', '156.07416', '50.50891', '71.55432'])
(289, ['230.10303', '156.07416', '50.50891', '71.55432'])
(297, ['45.11407', '289.19269', '89.52289', '112.98071'])
(297, ['64.12659', '290.98798', '123.71802', '120.64038'])

farleylai · 2021-10-17T03:08:22Z

A recent Copy-Paste augmentation combined with self-training on Objects365 seemingly boosting the COCO performance by 1.5% without TTA may deserve a look: https://arxiv.org/abs/2012.07177

glenn-jocher · 2021-10-17T22:25:10Z

@farleylai yes, the data will always have issues, so the best thing to do is fix whats's fixable and ignore (but notify user about) problem images/labels.

Though I'm also surprised that an organization would expend the resources to label almost 2 million images and not do basic cleaning and checking of their data.

glenn-jocher · 2021-10-20T22:07:26Z

@farleylai I trained a YOLOv5m model on Objects365 following this PR and the other related fixes. Everything works well. [email protected]:0.95 was only 18.5 after 30 epochs, but person mAP was similar to COCO, about 55 [email protected]:0.95. I'm sure this could also be improved with more epochs and additional tweaks, but at first glance all is good here.

DDP train command:

python train.py --data Objects365.yaml --batch 224 --weights --cfg yolov5m.yaml --epochs 30 --img 640 --hyp hyp.scratch-low.yaml --device 0,1,2,3,4,5,6

Results

# YOLOv5m v6.0 COCO 300 epochs
                 all       5000      36335      0.726      0.569      0.633      0.439
              person       5000      10777      0.792      0.735      0.807      0.554

# YOLOv5m v6.0 Objects365 30 epochs
                 all      80000    1239576      0.626      0.265      0.273      0.185
              Person      80000      80332      0.599      0.765      0.759       0.57

farleylai · 2021-10-20T22:38:37Z

Looks very promising and somewhat manageable compared with OpenImages.
Their paper in 2019 was based on v1 where the selling point is much better transfer to COCO.
The v2 is nearly three times larger (600K+ vs 1742K+) but there are not many results on v2 yet.
There is one on one-shot detection though rejected in ICLR2021.

glenn-jocher · 2021-10-23T11:54:01Z

@farleylai trained model uploaded to https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5m_Objects365.pt. This is just a first time training, so I'm sure there's room for improvement in the results.

Yes I agree, I like this dataset. It's got more classes and more images than COCO. Here are YOLOv5m detections for both. With Objects365 you get additional useful categories, i.e. shoes, sunglasses

COCO	Objects365

farleylai · 2021-10-26T17:35:55Z

While COCO is widely used for benchmarking, the limited number of classes does not help much detect rich contextual objects other than persons in diverse real applications. Though transfer from Objects365 to COCO is likely to improve the benchmark results, the other way around from COCO to Objects365 could be more useful in practice. Before that, I think a well-tuned baseline would be necessary and the results should be at least or even much better than v1.

glenn-jocher · 2021-10-27T15:11:13Z

@farleylai yes good points!

ahong007007 · 2021-11-02T06:51:36Z

@farleylai I trained a YOLOv5m model on Objects365 following this PR and the other related fixes. Everything works well. [email protected]:0.95 was only 18.5 after 30 epochs, but person mAP was similar to COCO, about 55 [email protected]:0.95. I'm sure this could also be improved with more epochs and additional tweaks, but at first glance all is good here.

DDP train command:
python train.py --data Objects365.yaml --batch 224 --weights --cfg yolov5m.yaml --epochs 30 --img 640 --hyp hyp.scratch-low.yaml --device 0,1,2,3,4,5,6
Results
# YOLOv5m v6.0 COCO 300 epochs
                 all       5000      36335      0.726      0.569      0.633      0.439
              person       5000      10777      0.792      0.735      0.807      0.554

# YOLOv5m v6.0 Objects365 30 epochs
                 all      80000    1239576      0.626      0.265      0.273      0.185
              Person      80000      80332      0.599      0.765      0.759       0.57

Thank you for your wonderful work.
Please tell me, you are using 8 GPUs？ the model is v100 32G memory? How long is the training time？

glenn-jocher · 2021-11-02T12:46:46Z

@ahong007007 yes we used an AWS P4d instance with 8 A100s with DPP for Objects365 training. For 30 epochs of YOLOv5m it was pretty fast, about 1.5 days. Training command in #5194 (comment)

…ytics#5194) * Update Objects365.yaml Download the official Objects365 validation set and convert the labels * Enforce 4-space indent, reformat and cleanup * shorten list comprehension Co-authored-by: Glenn Jocher <[email protected]>

sibozhang · 2023-08-11T01:14:29Z

@farleylai I trained a YOLOv5m model on Objects365 following this PR and the other related fixes. Everything works well. [email protected]:0.95 was only 18.5 after 30 epochs, but person mAP was similar to COCO, about 55 [email protected]:0.95. I'm sure this could also be improved with more epochs and additional tweaks, but at first glance all is good here.

DDP train command:
python train.py --data Objects365.yaml --batch 224 --weights --cfg yolov5m.yaml --epochs 30 --img 640 --hyp hyp.scratch-low.yaml --device 0,1,2,3,4,5,6
Results
# YOLOv5m v6.0 COCO 300 epochs
                 all       5000      36335      0.726      0.569      0.633      0.439
              person       5000      10777      0.792      0.735      0.807      0.554

# YOLOv5m v6.0 Objects365 30 epochs
                 all      80000    1239576      0.626      0.265      0.273      0.185
              Person      80000      80332      0.599      0.765      0.759       0.57

According to https://docs.ultralytics.com/yolov5/tutorials/multi_gpu_training/#faq.
Should we train DDP using python -m torch.distributed.run --nproc_per_node 8 train.py --data Objects365.yaml --weights yolov5m.pt --batch 128 --freeze 10 --device 0,1,2,3,4,5,6,7 --epochs 200 --hyp hyp.scratch-low.yaml?

Thanks! @glenn-jocher

glenn-jocher · 2023-08-11T01:55:51Z

@sibozhang training using DDP with multiple GPUs can be done using the torch.distributed.run module. You can use the following command as a template for training with 8 GPUs:

python -m torch.distributed.run --nproc_per_node 8 train.py --data Objects365.yaml --weights yolov5m.pt --batch 128 --freeze 10 --device 0,1,2,3,4,5,6,7 --epochs 200 --hyp hyp.scratch-low.yaml

This command will distribute the training across the specified GPUs. Adjust the batch size, number of epochs, and other parameters as desired. Good luck with your training!

sibozhang · 2023-08-12T06:22:56Z

@sibozhang training using DDP with multiple GPUs can be done using the torch.distributed.run module. You can use the following command as a template for training with 8 GPUs:
python -m torch.distributed.run --nproc_per_node 8 train.py --data Objects365.yaml --weights yolov5m.pt --batch 128 --freeze 10 --device 0,1,2,3,4,5,6,7 --epochs 200 --hyp hyp.scratch-low.yaml
This command will distribute the training across the specified GPUs. Adjust the batch size, number of epochs, and other parameters as desired. Good luck with your training!

[E ProcessGroupNCCL.cpp:828] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1808113 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:455] Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on corrupted/incomplete data.
[E ProcessGroupNCCL.cpp:460] To avoid data inconsistency, we are taking the entire process down.
terminate called after throwing an instance of 'std::runtime_error'
what(): [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=3, OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1806776 milliseconds before timing out.

I cannot start to train Objects365 on A100x8 40GB since NCCL timeout. I also tried add cache

python -m torch.distributed.run --nproc_per_node 8 train.py --data Objects365.yaml --weights runs/train/exp90/weights/best.pt --batch 128 --workers 8 --freeze 10 --img 640 --device 0,1,2,3,4,5,6,7 --epochs 100 --cache

How to change NCCL timeout settings? @glenn-jocher

glenn-jocher · 2023-11-14T17:43:50Z

@sibozhang it looks like you're encountering NCCL timeout issues during training on A100x8 40GB. You can try adjusting the NCCL timeout settings using the NCCL_SOCKET_IFNAME environment variable to select specific network interfaces for interprocess communication. Additionally, you can modify NCCL's timeout threshold using the NCCL_DEBUG=INFO environment variable to print more detailed information about NCCL's operation.

If the issue persists, please refer to the official documentation for NVIDIA NCCL or reach out to the NVIDIA support channels for further assistance. Good luck with your training!

p2p-sys · 2024-06-06T10:07:58Z

@glenn-jocher Are there plans to release new versions of Objects365 models for new versions of Yolo? I have made a room classifier https://github.com/p2p-sys/yolo5-classificator using COCO and Objects365 models, I would like to use the new versions of Yolo. Unfortunately Objects365 could not be trained on its own, in the process of training the programme is killed by the system

glenn-jocher · 2024-06-06T14:42:43Z

Hello @p2p-sys! We're always working on improving and updating our models, including those trained on different datasets like Objects365. Keep an eye on our GitHub releases for any updates on new versions of YOLO trained with Objects365.

Regarding the issue with training being killed, it might be related to system resource limitations. Ensure you have sufficient memory and processing power, or consider reducing the batch size or using a simpler model. If the problem persists, please open an issue with detailed logs and system specs for further assistance.

Thank you for using YOLOv5 for your room classifier project! 🚀

Update Objects365.yaml

55f5526

Download the official Objects365 validation set and convert the labels

github-actions bot reviewed Oct 15, 2021

View reviewed changes

farleylai mentioned this pull request Oct 15, 2021

Objects365 Dataset AutoDownload #2932

Merged

glenn-jocher added 2 commits October 14, 2021 19:12

Enforce 4-space indent, reformat and cleanup

5e47766

shorten list comprehension

3f094dc

glenn-jocher approved these changes Oct 15, 2021

View reviewed changes

glenn-jocher merged commit fc36064 into ultralytics:master Oct 15, 2021

glenn-jocher mentioned this pull request Oct 15, 2021

Autofix duplicate label handling #5210

Merged

glenn-jocher mentioned this pull request Oct 15, 2021

Update Objects365.yaml val count #5212

Merged

glenn-jocher assigned farleylai Oct 23, 2021

glenn-jocher self-assigned this Oct 23, 2021

glenn-jocher mentioned this pull request Feb 22, 2022

YOLOv5 v6.1 release #6739

Merged

glenn-jocher mentioned this pull request Apr 7, 2022

Objects365 pretrained model #7331

Closed

1 task

buxihuo mentioned this pull request Aug 12, 2022

Which hyperparameter should I use to train the objects365 dataset ? #8942

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Objects365.yaml to include the official validation set #5194

Update Objects365.yaml to include the official validation set #5194

farleylai commented Oct 15, 2021 •

edited by UltralyticsAssistant

Loading

github-actions bot left a comment

farleylai commented Oct 15, 2021 •

edited

Loading

glenn-jocher commented Oct 15, 2021

glenn-jocher commented Oct 15, 2021

farleylai commented Oct 15, 2021 •

edited

Loading

glenn-jocher commented Oct 15, 2021 •

edited

Loading

glenn-jocher commented Oct 15, 2021 •

edited

Loading

glenn-jocher commented Oct 15, 2021

glenn-jocher commented Oct 15, 2021

farleylai commented Oct 15, 2021

glenn-jocher commented Oct 15, 2021

glenn-jocher commented Oct 15, 2021

farleylai commented Oct 15, 2021

glenn-jocher commented Oct 15, 2021

farleylai commented Oct 15, 2021 •

edited

Loading

glenn-jocher commented Oct 16, 2021

farleylai commented Oct 16, 2021 •

edited

Loading

farleylai commented Oct 17, 2021

glenn-jocher commented Oct 17, 2021 •

edited

Loading

glenn-jocher commented Oct 20, 2021 •

edited

Loading

farleylai commented Oct 20, 2021 •

edited

Loading

glenn-jocher commented Oct 23, 2021 •

edited

Loading

farleylai commented Oct 26, 2021

glenn-jocher commented Oct 27, 2021

ahong007007 commented Nov 2, 2021

glenn-jocher commented Nov 2, 2021

sibozhang commented Aug 11, 2023 •

edited

Loading

glenn-jocher commented Aug 11, 2023

sibozhang commented Aug 12, 2023 •

edited

Loading

glenn-jocher commented Nov 14, 2023

p2p-sys commented Jun 6, 2024

glenn-jocher commented Jun 6, 2024

Update Objects365.yaml to include the official validation set #5194

Update Objects365.yaml to include the official validation set #5194

Conversation

farleylai commented Oct 15, 2021 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

github-actions bot left a comment

Choose a reason for hiding this comment

farleylai commented Oct 15, 2021 • edited Loading

glenn-jocher commented Oct 15, 2021

glenn-jocher commented Oct 15, 2021

farleylai commented Oct 15, 2021 • edited Loading

glenn-jocher commented Oct 15, 2021 • edited Loading

glenn-jocher commented Oct 15, 2021 • edited Loading

glenn-jocher commented Oct 15, 2021

glenn-jocher commented Oct 15, 2021

farleylai commented Oct 15, 2021

glenn-jocher commented Oct 15, 2021

glenn-jocher commented Oct 15, 2021

farleylai commented Oct 15, 2021

glenn-jocher commented Oct 15, 2021

farleylai commented Oct 15, 2021 • edited Loading

Per patch tarball images:

Training set with 1742290 (1742289 distinct) images in total:

Validation set with 8000 images in total:

glenn-jocher commented Oct 16, 2021

farleylai commented Oct 16, 2021 • edited Loading

farleylai commented Oct 17, 2021

glenn-jocher commented Oct 17, 2021 • edited Loading

glenn-jocher commented Oct 20, 2021 • edited Loading

farleylai commented Oct 20, 2021 • edited Loading

glenn-jocher commented Oct 23, 2021 • edited Loading

farleylai commented Oct 26, 2021

glenn-jocher commented Oct 27, 2021

ahong007007 commented Nov 2, 2021

glenn-jocher commented Nov 2, 2021

sibozhang commented Aug 11, 2023 • edited Loading

glenn-jocher commented Aug 11, 2023

sibozhang commented Aug 12, 2023 • edited Loading

glenn-jocher commented Nov 14, 2023

p2p-sys commented Jun 6, 2024

glenn-jocher commented Jun 6, 2024

farleylai commented Oct 15, 2021 •

edited by UltralyticsAssistant

Loading

farleylai commented Oct 15, 2021 •

edited

Loading

farleylai commented Oct 15, 2021 •

edited

Loading

glenn-jocher commented Oct 15, 2021 •

edited

Loading

glenn-jocher commented Oct 15, 2021 •

edited

Loading

farleylai commented Oct 15, 2021 •

edited

Loading

farleylai commented Oct 16, 2021 •

edited

Loading

glenn-jocher commented Oct 17, 2021 •

edited

Loading

glenn-jocher commented Oct 20, 2021 •

edited

Loading

farleylai commented Oct 20, 2021 •

edited

Loading

glenn-jocher commented Oct 23, 2021 •

edited

Loading

sibozhang commented Aug 11, 2023 •

edited

Loading

sibozhang commented Aug 12, 2023 •

edited

Loading