The results seems to be very low when I re-run the code for 30 epochs? #6

happinesslz · 2019-07-05T09:28:05Z

mAP0.250000: 0.042591
mAP0.500000: 0.01734
obj_accuracy: 0.86129
The results seems to be very low when I re-run the code for 30 epochs. It takes me about 5 days. Why?

The output of the log.log file:
[32m[0705 13:50:41 @monitor.py:467][0m lr: 0.001
[32m[0705 13:50:41 @monitor.py:467][0m mAP0.250000: 0.042591
[32m[0705 13:50:41 @monitor.py:467][0m mAP0.500000: 0.01734
[32m[0705 13:50:41 @monitor.py:467][0m obj_accuracy: 0.86129
[32m[0705 13:50:41 @monitor.py:467][0m param-summary/fp1/conv_0/W-rms: 0.24582
[32m[0705 13:50:41 @monitor.py:467][0m param-summary/fp1/conv_1/W-rms: 0.26173
[32m[0705 13:50:41 @monitor.py:467][0m param-summary/fp2/conv_0/W-rms: 0.25195
[32m[0705 13:50:41 @monitor.py:467][0m param-summary/fp2/conv_1/W-rms: 0.28531
[32m[0705 13:50:41 @monitor.py:467][0m param-summary/proposal/conv0/W-rms: 0.27876
........
.......
PeriodicTrigger-Evaluator: 2 hours 18 minutes 58 seconds
[32m[0705 13:50:41 @base.py:275][0m Start Epoch 30 ...
[32m[0705 14:20:26 @base.py:285][0m Epoch 30 (global_step 79260) finished, time:29 minutes 45 seconds.
[32m[0705 14:20:26 @saver.py:79][0m Model saved to train_log/run/model-79260.

xudi1227 · 2019-07-06T14:07:28Z

Hi, may I know what kind of GPU you are using? The original work used Volta Quadro GP100

happinesslz · 2019-07-09T10:28:50Z

@xudi1227 Thanks! My GPU is RTX2080Ti with CUDA10.0+TF1.13 or GTX1060 with CUDA9.0+TF1.12. But both of them obtain the similar results (The mAP0.25 is about 4.26% and mAP0.50 is about 1.7%.). So I don't think that the different GPUs can have such a big gap in the performance of mAP.

f3rhoodn · 2019-07-19T22:31:08Z

@happinesslz what is your Cudnn? I am using RTX2060 with Cuda 10.0 and TF1.13 and it does not work. with TF.1.14 it only runs on CPU and not GPU.

happinesslz · 2019-07-20T06:11:04Z

@f3rhoodn My cudnn is cudnn-10.0-linux-x64-v7.4.2.24.tgz (The Version is v7.4.2).

qq456cvb · 2019-08-01T08:41:10Z

Hey, guys, I've updated my code so that it would run much faster now by caching training/testing data on CPU. Before this commit, all data must be read from the disk followed by some preprocessing on CPU side, which is actually a bottleneck.

WangZhouTao · 2019-08-04T07:39:05Z

Hey，I met the same situation. I re-run the run.py file for training about 60epoch. The results seems to be very low. Have you solved this problem yet? Thank you very much.

NUAAXQ · 2019-08-16T09:01:24Z

@happinesslz The total_cost are always nan during training. Did your meet the same problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The results seems to be very low when I re-run the code for 30 epochs? #6

The results seems to be very low when I re-run the code for 30 epochs? #6

happinesslz commented Jul 5, 2019 •

edited

Loading

xudi1227 commented Jul 6, 2019

happinesslz commented Jul 9, 2019

f3rhoodn commented Jul 19, 2019

happinesslz commented Jul 20, 2019

qq456cvb commented Aug 1, 2019

WangZhouTao commented Aug 4, 2019 •

edited

Loading

NUAAXQ commented Aug 16, 2019

The results seems to be very low when I re-run the code for 30 epochs? #6

The results seems to be very low when I re-run the code for 30 epochs? #6

Comments

happinesslz commented Jul 5, 2019 • edited Loading

xudi1227 commented Jul 6, 2019

happinesslz commented Jul 9, 2019

f3rhoodn commented Jul 19, 2019

happinesslz commented Jul 20, 2019

qq456cvb commented Aug 1, 2019

WangZhouTao commented Aug 4, 2019 • edited Loading

NUAAXQ commented Aug 16, 2019

happinesslz commented Jul 5, 2019 •

edited

Loading

WangZhouTao commented Aug 4, 2019 •

edited

Loading