training error! #2

lucheng07082221 · 2018-11-14T07:13:45Z

Hi:
我在训练的时候到这里报错：

loss_cls = (1 / nB) * self.ce_loss(pred_cls[mask], torch.argmax(tcls[mask], dim=1))

Traceback (most recent call last):
File "/home/lc/work/yolov3-network-slimming/sparsity_train.py", line 159, in
train()
File "/home/lc/work/yolov3-network-slimming/sparsity_train.py", line 107, in train
loss = model(imgs, targets)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/lc/work/yolov3-network-slimming/yolomodel.py", line 365, in forward
x, *losses = self.module_list[i][0](x, targets)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/lc/work/yolov3-network-slimming/yolomodel.py", line 147, in forward
print(torch.argmax(tcls[mask], dim=1))
File "/usr/local/lib/python3.6/dist-packages/torch/functional.py", line 374, in argmax
return torch._argmax(input, dim, keepdim)
RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

talebolano · 2018-11-15T03:25:17Z

hi，
请问是在一开始就报错还是在训练中报错，这有可能是训练集标签不匹配造成的
我的yolo源码来自于https://github.com/eriklindernoren/PyTorch-YOLOv3
貌似有人也遇到了相似的问题

lucheng07082221 · 2018-11-15T07:28:23Z

@talebolano 这个问题解决了，就是没找到标签，label和images要在同一个目录下面，谢谢哈！

lucheng07082221 · 2018-11-15T09:15:12Z

@talebolano 这个可以多块gpu一起训练吗

talebolano · 2018-11-15T13:47:10Z

@talebolano 这个可以多块gpu一起训练吗

目前还不可以，我会在近期添加多块gpu运行的代码

Liqing6668 · 2019-02-27T09:50:08Z

@talebolano
您好，我在训练的时候报错，请您帮忙看一下。
[Epoch 0/2000, Batch 155/4402] [Losses: x 0.184448, y 0.152233, w 0.473080, h 0.243340, conf 0.419103, cls 1.269924, total 2.742128, recall: 0.59055, precision: 0.05823]
[Epoch 0/2000, Batch 156/4402] [Losses: x 0.161208, y 0.169744, w 0.484731, h 0.234620, conf 0.445205, cls 1.283632, total 2.779141, recall: 0.61207, precision: 0.05506]
[Epoch 0/2000, Batch 157/4402] [Losses: x 0.180584, y 0.142397, w 0.394638, h 0.239581, conf 0.393050, cls 1.264419, total 2.614670, recall: 0.61757, precision: 0.06406]
[Epoch 0/2000, Batch 158/4402] [Losses: x 0.168928, y 0.124468, w 0.472078, h 0.200339, conf 0.371993, cls 1.261448, total 2.599255, recall: 0.61994, precision: 0.05572]
[Epoch 0/2000, Batch 159/4402] [Losses: x 0.156252, y 0.146104, w 0.336951, h 0.280383, conf 0.460892, cls 1.268392, total 2.648975, recall: 0.60119, precision: 0.05369]
[Epoch 0/2000, Batch 160/4402] [Losses: x 0.186453, y 0.147905, w 0.433254, h 0.227833, conf 0.377308, cls 1.252797, total 2.625550, recall: 0.60784, precision: 0.08115]
[Epoch 0/2000, Batch 161/4402] [Losses: x 0.181762, y 0.173695, w 0.988128, h 0.325833, conf 0.431874, cls 1.305756, total 3.407048, recall: 0.43548, precision: 0.05132]
[Epoch 0/2000, Batch 162/4402] [Losses: x 0.174461, y 0.164167, w 0.488938, h 0.336979, conf 0.619493, cls 1.290267, total 3.074305, recall: 0.52143, precision: 0.05007]
[Epoch 0/2000, Batch 163/4402] [Losses: x 0.209021, y 0.128292, w 0.700684, h 0.202287, conf 0.369960, cls 1.244245, total 2.854489, recall: 0.56198, precision: 0.04331]
[Epoch 0/2000, Batch 164/4402] [Losses: x 0.216554, y 0.121116, w 0.389169, h 0.224613, conf 0.367337, cls 1.234381, total 2.553170, recall: 0.58206, precision: 0.07246]
[Epoch 0/2000, Batch 165/4402] [Losses: x 0.202888, y 0.139989, w 0.332888, h 0.200512, conf 0.407665, cls 1.273087, total 2.557030, recall: 0.58230, precision: 0.06962]
[Epoch 0/2000, Batch 166/4402] [Losses: x 0.164832, y 0.102894, w 0.821020, h 0.175726, conf 0.359270, cls 1.234681, total 2.858424, recall: 0.60841, precision: 0.05036]
[Epoch 0/2000, Batch 167/4402] [Losses: x 0.171323, y 0.160461, w 0.659578, h 0.529196, conf 0.378674, cls 1.251355, total 3.150587, recall: 0.55263, precision: 0.05115]
[Epoch 0/2000, Batch 168/4402] [Losses: x 0.178747, y 0.127256, w 0.795630, h 0.410524, conf 0.363543, cls 1.255784, total 3.131485, recall: 0.53846, precision: 0.03526]
Traceback (most recent call last):
File "sparsity_train.py", line 153, in
train()
File "sparsity_train.py", line 100, in train
loss = model(imgs, targets)
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 492, in call
result = self.forward(*input, **kwargs)
File "/home/Liqing/Liqing/yolov3-network-slimming-master/yolomodel.py", line 349, in forward
x, *losses = self.module_list[i][0](x, targets)
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 492, in call
result = self.forward(*input, **kwargs)
File "/home/Liqing/Liqing/yolov3-network-slimming-master/yolomodel.py", line 102, in forward
img_dim=self.image_dim,
File "/home/Liqing/Liqing/yolov3-network-slimming-master/util.py", line 187, in build_targets
conf_mask[b, anch_ious > ignore_thres, gj, gi] = 0
IndexError: index 21 is out of bounds for dimension 3 with size 17

liaoyunkun · 2019-07-19T09:04:48Z

@lucheng07082221 您好，我在pascal voc数据集上遇到了这个问题，请问您是在那个数据集上测试遇到这个问题，具体是如何解决？

pandasong · 2019-08-08T03:45:02Z

@Liqing6668 您好，请问下IndexError: index 21 is out of bounds for dimension 3 with size 17这个问题您解决了吗，是怎么解决的？我也遇到同样的问题！非常感谢！！！

hemp110 mentioned this issue Aug 4, 2019

稀疏训练指标异常 #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training error! #2

training error! #2

lucheng07082221 commented Nov 14, 2018

talebolano commented Nov 15, 2018 •

edited

Loading

lucheng07082221 commented Nov 15, 2018

lucheng07082221 commented Nov 15, 2018

talebolano commented Nov 15, 2018

Liqing6668 commented Feb 27, 2019

liaoyunkun commented Jul 19, 2019

pandasong commented Aug 8, 2019

training error! #2

training error! #2

Comments

lucheng07082221 commented Nov 14, 2018

talebolano commented Nov 15, 2018 • edited Loading

lucheng07082221 commented Nov 15, 2018

lucheng07082221 commented Nov 15, 2018

talebolano commented Nov 15, 2018

Liqing6668 commented Feb 27, 2019

liaoyunkun commented Jul 19, 2019

pandasong commented Aug 8, 2019

talebolano commented Nov 15, 2018 •

edited

Loading