train_loss is NAN when use resnet-101 #33

ElbertFang · 2018-03-17T05:11:40Z

Thanks for you great job at first.
I have success training the scst use restnet-152. But some problems happened when I use resnet-101. I save the features token by resnet-101 in another file and use it by absolute path and the train_loss became NAN all the time.
I print some data to find out the problems.
fc_feats, att_feats, labels, masks = tmp
print(fc_feats)
and I get this.
Variable containing:
inf inf inf ... 2.3545e-02 4.4336e-01 4.5093e-02
inf inf inf ... 2.3545e-02 4.4336e-01 4.5093e-02
inf inf inf ... 2.3545e-02 4.4336e-01 4.5093e-02
... ⋱ ...
9.7345e-01 1.5666e+00 1.1705e-01 ... 0.0000e+00 2.7455e-01 3.6576e-04
9.7345e-01 1.5666e+00 1.1705e-01 ... 0.0000e+00 2.7455e-01 3.6576e-04
9.7345e-01 1.5666e+00 1.1705e-01 ... 0.0000e+00 2.7455e-01 3.6576e-04
[torch.cuda.FloatTensor of size 50x2048 (GPU 0)]
Looks like I did't get the true fc feats. But I change nothing in prepro_labels.py and it works well when token the features of resnet-152.

ruotianluo · 2019-12-31T21:57:31Z

I have no idea. really weird. close it for now.

fearless77 mentioned this issue Jul 31, 2018

Got an error during training #51

Closed

ruotianluo closed this as completed Dec 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_loss is NAN when use resnet-101 #33

train_loss is NAN when use resnet-101 #33

ElbertFang commented Mar 17, 2018

ruotianluo commented Dec 31, 2019 •

edited

Loading

train_loss is NAN when use resnet-101 #33

train_loss is NAN when use resnet-101 #33

Comments

ElbertFang commented Mar 17, 2018

ruotianluo commented Dec 31, 2019 • edited Loading

ruotianluo commented Dec 31, 2019 •

edited

Loading