Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training error! #2

Open
lucheng07082221 opened this issue Nov 14, 2018 · 7 comments
Open

training error! #2

lucheng07082221 opened this issue Nov 14, 2018 · 7 comments

Comments

@lucheng07082221
Copy link

Hi:
我在训练的时候到这里报错:

loss_cls = (1 / nB) * self.ce_loss(pred_cls[mask], torch.argmax(tcls[mask], dim=1))

Traceback (most recent call last):
File "/home/lc/work/yolov3-network-slimming/sparsity_train.py", line 159, in
train()
File "/home/lc/work/yolov3-network-slimming/sparsity_train.py", line 107, in train
loss = model(imgs, targets)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/lc/work/yolov3-network-slimming/yolomodel.py", line 365, in forward
x, *losses = self.module_list[i][0](x, targets)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/lc/work/yolov3-network-slimming/yolomodel.py", line 147, in forward
print(torch.argmax(tcls[mask], dim=1))
File "/usr/local/lib/python3.6/dist-packages/torch/functional.py", line 374, in argmax
return torch._argmax(input, dim, keepdim)
RuntimeError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

@talebolano
Copy link
Owner

talebolano commented Nov 15, 2018

hi,
请问是在一开始就报错还是在训练中报错,这有可能是训练集标签不匹配造成的
我的yolo源码来自于https://github.com/eriklindernoren/PyTorch-YOLOv3
貌似有人也遇到了相似的问题

@lucheng07082221
Copy link
Author

@talebolano 这个问题解决了,就是没找到标签,label和images要在同一个目录下面,谢谢哈!

@lucheng07082221
Copy link
Author

@talebolano 这个可以多块gpu一起训练吗

@talebolano
Copy link
Owner

@talebolano 这个可以多块gpu一起训练吗

目前还不可以,我会在近期添加多块gpu运行的代码

@Liqing6668
Copy link

@talebolano
您好,我在训练的时候报错,请您帮忙看一下。
[Epoch 0/2000, Batch 155/4402] [Losses: x 0.184448, y 0.152233, w 0.473080, h 0.243340, conf 0.419103, cls 1.269924, total 2.742128, recall: 0.59055, precision: 0.05823]
[Epoch 0/2000, Batch 156/4402] [Losses: x 0.161208, y 0.169744, w 0.484731, h 0.234620, conf 0.445205, cls 1.283632, total 2.779141, recall: 0.61207, precision: 0.05506]
[Epoch 0/2000, Batch 157/4402] [Losses: x 0.180584, y 0.142397, w 0.394638, h 0.239581, conf 0.393050, cls 1.264419, total 2.614670, recall: 0.61757, precision: 0.06406]
[Epoch 0/2000, Batch 158/4402] [Losses: x 0.168928, y 0.124468, w 0.472078, h 0.200339, conf 0.371993, cls 1.261448, total 2.599255, recall: 0.61994, precision: 0.05572]
[Epoch 0/2000, Batch 159/4402] [Losses: x 0.156252, y 0.146104, w 0.336951, h 0.280383, conf 0.460892, cls 1.268392, total 2.648975, recall: 0.60119, precision: 0.05369]
[Epoch 0/2000, Batch 160/4402] [Losses: x 0.186453, y 0.147905, w 0.433254, h 0.227833, conf 0.377308, cls 1.252797, total 2.625550, recall: 0.60784, precision: 0.08115]
[Epoch 0/2000, Batch 161/4402] [Losses: x 0.181762, y 0.173695, w 0.988128, h 0.325833, conf 0.431874, cls 1.305756, total 3.407048, recall: 0.43548, precision: 0.05132]
[Epoch 0/2000, Batch 162/4402] [Losses: x 0.174461, y 0.164167, w 0.488938, h 0.336979, conf 0.619493, cls 1.290267, total 3.074305, recall: 0.52143, precision: 0.05007]
[Epoch 0/2000, Batch 163/4402] [Losses: x 0.209021, y 0.128292, w 0.700684, h 0.202287, conf 0.369960, cls 1.244245, total 2.854489, recall: 0.56198, precision: 0.04331]
[Epoch 0/2000, Batch 164/4402] [Losses: x 0.216554, y 0.121116, w 0.389169, h 0.224613, conf 0.367337, cls 1.234381, total 2.553170, recall: 0.58206, precision: 0.07246]
[Epoch 0/2000, Batch 165/4402] [Losses: x 0.202888, y 0.139989, w 0.332888, h 0.200512, conf 0.407665, cls 1.273087, total 2.557030, recall: 0.58230, precision: 0.06962]
[Epoch 0/2000, Batch 166/4402] [Losses: x 0.164832, y 0.102894, w 0.821020, h 0.175726, conf 0.359270, cls 1.234681, total 2.858424, recall: 0.60841, precision: 0.05036]
[Epoch 0/2000, Batch 167/4402] [Losses: x 0.171323, y 0.160461, w 0.659578, h 0.529196, conf 0.378674, cls 1.251355, total 3.150587, recall: 0.55263, precision: 0.05115]
[Epoch 0/2000, Batch 168/4402] [Losses: x 0.178747, y 0.127256, w 0.795630, h 0.410524, conf 0.363543, cls 1.255784, total 3.131485, recall: 0.53846, precision: 0.03526]
Traceback (most recent call last):
File "sparsity_train.py", line 153, in
train()
File "sparsity_train.py", line 100, in train
loss = model(imgs, targets)
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 492, in call
result = self.forward(*input, **kwargs)
File "/home/Liqing/Liqing/yolov3-network-slimming-master/yolomodel.py", line 349, in forward
x, *losses = self.module_list[i][0](x, targets)
File "/opt/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 492, in call
result = self.forward(*input, **kwargs)
File "/home/Liqing/Liqing/yolov3-network-slimming-master/yolomodel.py", line 102, in forward
img_dim=self.image_dim,
File "/home/Liqing/Liqing/yolov3-network-slimming-master/util.py", line 187, in build_targets
conf_mask[b, anch_ious > ignore_thres, gj, gi] = 0
IndexError: index 21 is out of bounds for dimension 3 with size 17

@liaoyunkun
Copy link

@lucheng07082221 您好,我在pascal voc数据集上遇到了这个问题,请问您是在那个数据集上测试遇到这个问题,具体是如何解决?

@pandasong
Copy link

@Liqing6668 您好,请问下IndexError: index 21 is out of bounds for dimension 3 with size 17这个问题您解决了吗,是怎么解决的?我也遇到同样的问题!非常感谢!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants