Training crashing for instance segmentation at U7 #795

alrightkami · 2022-09-15T14:43:06Z

I'm trying to start a training on the u7 branch for Instance Segmentation, but getting an error and can't figure out what it refers to.


Starting training for 300 epochs...
      Epoch    GPU_mem   box_loss   seg_loss   obj_loss   cls_loss  Instances       Size
  0%|          | 0/126 [00:00<?, ?it/s]                                         
Traceback (most recent call last):
  File "/home/data/yolov7/seg/segment/train.py", line 681, in <module>
    main(opt)
  File "/home/data/yolov7/seg/segment/train.py", line 577, in main
    train(opt.hyp, opt, device, callbacks)
  File "/home/data/yolov7/seg/segment/train.py", line 295, in train
    for i, (imgs, targets, paths, _, masks) in pbar:  # batch ------------------------------------------------------
  File "/opt/conda/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/home/data/yolov7/seg/utils/dataloaders.py", line 171, in __iter__
    yield next(self.iterator)
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
    return self._process_data(data)
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
    data.reraise()
  File "/opt/conda/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/data/yolov7/seg/utils/segment/dataloaders.py", line 116, in __getitem__
    img, labels, segments = mixup(img, labels, segments, *self.load_mosaic(random.randint(0, self.n - 1)))
  File "/home/data/yolov7/seg/utils/segment/augmentations.py", line 21, in mixup
    segments = np.concatenate((segments, segments2), 0)
  File "<__array_function__ internals>", line 180, in concatenate
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 3 dimension(s) and the array at index 1 has 1 dimension(s)

The script:
!cd data/yolov7/seg && python segment/train.py --data data/graffiti.yaml --batch 16 --weights yolov7-seg.pt --cfg yolov7-seg.yaml --epochs 300 --name yolov7-seg --img 640 --hyp hyp.scratch-high.yaml

To generate data I used this script.

The text was updated successfully, but these errors were encountered:

aqsc · 2022-09-16T01:55:08Z

you should use segmentation labels instead of detection labels.

alrightkami · 2022-09-16T05:44:03Z

@aqsc I do already, I converted them from COCO format with the tool mentioned above suggested by WongKinYiu. This is an example of my label file:

0 0.00394477 0.00758663 0.00760355 0.00758663 0.0515335 0.00605611 0.10645 0.00605611 0.168683 0.00605611 0.205291 0.0106518 0.245562 0.0152434 0.273018 0.0167781 0.315118 0.0121823 0.371859 0.00299505 0.41396 0.0167781 0.434093 0.0351526 0.439586 0.0719059 0.439586 0.108659 0.42677 0.120912 0.377352 0.120912 0.3261 0.108659 0.30858 0.0872112 0.282002 0.0993399 0.227253 0.105598 0.190646 0.0902847 0.174172 0.0765017 0.154038 0.0795627 0.110108 0.0918152 0.0606854 0.08875 0.0368935 0.0872195 0.0222485 0.0826279 0.000281065 0.0657797

prateekgml · 2022-09-16T14:32:50Z

@alrightkami I am able to do the custom instance segmentation model training. Maybe this post can help-
https://dsbyprateekg.blogspot.com/2022/09/how-to-train-custom-dataset-with-yolov7.html

dilpreetsingh · 2022-09-21T07:22:08Z

I'm getting the same crash, but mine occurs somewhat randomly. Here the crash occurred on the 8th epoch:

Epoch    GPU_mem   box_loss   seg_loss   obj_loss   cls_loss  Instances       Size
      8/299      7.44G    0.04525    0.02051    0.02537     0.0155         33        640:  14%|█▍
Traceback (most recent call last):
  File "segment/train.py", line 681, in <module>
    main(opt)
  File "segment/train.py", line 577, in main
    train(opt.hyp, opt, device, callbacks)
  File "segment/train.py", line 295, in train
    for i, (imgs, targets, paths, _, masks) in pbar:  # batch -------------------------------------------
-----------
  File "/home/dilpreet/anaconda3/envs/yolov7/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __it
er__
    for obj in iterable:
  File "/ssd/home/dilpreet/Documents/YOLOv7Seg/yolov7/seg/utils/dataloaders.py", line 171, in __iter__  
    yield next(self.iterator)
  File "/home/dilpreet/anaconda3/envs/yolov7/lib/python3.8/site-packages/torch/utils/data/dataloader.py",
 line 530, in __next__
    data = self._next_data()
  File "/home/dilpreet/anaconda3/envs/yolov7/lib/python3.8/site-packages/torch/utils/data/dataloader.py",
 line 1204, in _next_data
    return self._process_data(data)
  File "/home/dilpreet/anaconda3/envs/yolov7/lib/python3.8/site-packages/torch/utils/data/dataloader.py",
 line 1250, in _process_data
    data.reraise()
  File "/home/dilpreet/anaconda3/envs/yolov7/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise
    raise exception
ValueError: Caught ValueError in DataLoader worker process 4.
Original Traceback (most recent call last):
  File "/home/dilpreet/anaconda3/envs/yolov7/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/dilpreet/anaconda3/envs/yolov7/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/dilpreet/anaconda3/envs/yolov7/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/ssd/home/dilpreet/Documents/YOLOv7Seg/yolov7/seg/utils/segment/dataloaders.py", line 116, in __getitem__
    img, labels, segments = mixup(img, labels, segments, *self.load_mosaic(random.randint(0, self.n - 1)))
  File "/ssd/home/dilpreet/Documents/YOLOv7Seg/yolov7/seg/utils/segment/augmentations.py", line 21, in mixup
    segments = np.concatenate((segments, segments2), 0)
  File "<__array_function__ internals>", line 180, in concatenate
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 3 dimension(s) and the array at index 1 has 1 dimension(s)

Nikunj2696 · 2022-09-26T18:00:22Z

@dilpreetsingh Can you share which tool you used for annotation? I am quite sure that the issue is about annotation.

dilpreetsingh · 2022-09-27T01:40:37Z

@Nikunj2696 I wrote my own polygon conversion script because my data is in PascalVOC and I couldn't find anything that would go directly from PascalVOC to Darknet. As I understand it, the format is simply:

label_id, x1, y1, x2, y2, ..., xn, yn

It seems to do great for a certain number of epochs but has trouble with the mixup augmentation randomly. I did an experiment where I set the mixup probability to 0.0, and that worked perfectly. My current theory is that potentially the mixup augmentation doesn't handle images with 0 annotations/segments (I have some no-annotation images in the dataset).

Nikunj2696 · 2022-09-27T01:48:14Z

@Nikunj2696 I wrote my own polygon conversion script because my data is in PascalVOC and I couldn't find anything that would go directly from PascalVOC to Darknet. As I understand it, the format is simply:
label_id, x1, y1, x2, y2, ..., xn, yn
It seems to do great for a certain number of epochs but has trouble with the mixup augmentation randomly. I did an experiment where I set the mixup probability to 0.0, and that worked perfectly. My current theory is that potentially the mixup augmentation doesn't handle images with 0 annotations/segments (I have some no-annotation images in the dataset).

@dilpreetsingh Please use https://roboflow.com/ for annotation. I faced the same issue but solved it. This annotation tool helps you.

rrichards7 · 2022-09-27T16:25:31Z

I ran into this problem when using the "hyp.scratch-high.yaml" hyperparameter config file which has mixup enabled.
I think the crashing is happening "randomly" since mixup literally has a random probability of being selected as the augmentation to use for a particular step.

I would suggest using hyp.scratch-low.yaml to resolve this problem temporarily if mixup is not desirable
Or keep using hyp.scratch-high.yaml but set mixup to 0.0

Joe-KI333 · 2022-09-29T08:22:08Z

Learn to train from custom dataset for instance segmentation Read this blog

https://medium.com/augmented-startups/yolov7-segmentation-on-crack-using-roboflow-dataset-f13ae81b9958

Rezam1998 mentioned this issue Sep 18, 2022

Tool for convert YOLO annotation format to polygon #806

Open

PandhaB mentioned this issue Sep 23, 2022

Greetings! I was wondering if someone had the branch mask working? #835

Closed

Wong-denis mentioned this issue Nov 15, 2022

ValueError after training for a few epochs laitathei/YOLOv7-Pytorch-Segmentation#4

Closed

fsana-cpr mentioned this issue Nov 28, 2023

Fix error when image mixup hyperparam is non-zero for instance segmentation training on u7 #1928

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training crashing for instance segmentation at U7 #795

Training crashing for instance segmentation at U7 #795

alrightkami commented Sep 15, 2022

aqsc commented Sep 16, 2022

alrightkami commented Sep 16, 2022

prateekgml commented Sep 16, 2022

dilpreetsingh commented Sep 21, 2022

Nikunj2696 commented Sep 26, 2022

dilpreetsingh commented Sep 27, 2022

Nikunj2696 commented Sep 27, 2022

rrichards7 commented Sep 27, 2022 •

edited

Loading

Joe-KI333 commented Sep 29, 2022 •

edited

Loading

Training crashing for instance segmentation at U7 #795

Training crashing for instance segmentation at U7 #795

Comments

alrightkami commented Sep 15, 2022

aqsc commented Sep 16, 2022

alrightkami commented Sep 16, 2022

prateekgml commented Sep 16, 2022

dilpreetsingh commented Sep 21, 2022

Nikunj2696 commented Sep 26, 2022

dilpreetsingh commented Sep 27, 2022

Nikunj2696 commented Sep 27, 2022

rrichards7 commented Sep 27, 2022 • edited Loading

Joe-KI333 commented Sep 29, 2022 • edited Loading

rrichards7 commented Sep 27, 2022 •

edited

Loading

Joe-KI333 commented Sep 29, 2022 •

edited

Loading