Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The results seems to be very low when I re-run the code for 30 epochs? #6

Open
happinesslz opened this issue Jul 5, 2019 · 7 comments

Comments

@happinesslz
Copy link

happinesslz commented Jul 5, 2019

mAP0.250000: 0.042591
mAP0.500000: 0.01734
obj_accuracy: 0.86129
The results seems to be very low when I re-run the code for 30 epochs. It takes me about 5 days. Why?

The output of the log.log file:
[32m[0705 13:50:41 @monitor.py:467][0m lr: 0.001
[32m[0705 13:50:41 @monitor.py:467][0m mAP0.250000: 0.042591
[32m[0705 13:50:41 @monitor.py:467][0m mAP0.500000: 0.01734
[32m[0705 13:50:41 @monitor.py:467][0m obj_accuracy: 0.86129
[32m[0705 13:50:41 @monitor.py:467][0m param-summary/fp1/conv_0/W-rms: 0.24582
[32m[0705 13:50:41 @monitor.py:467][0m param-summary/fp1/conv_1/W-rms: 0.26173
[32m[0705 13:50:41 @monitor.py:467][0m param-summary/fp2/conv_0/W-rms: 0.25195
[32m[0705 13:50:41 @monitor.py:467][0m param-summary/fp2/conv_1/W-rms: 0.28531
[32m[0705 13:50:41 @monitor.py:467][0m param-summary/proposal/conv0/W-rms: 0.27876
........
.......
PeriodicTrigger-Evaluator: 2 hours 18 minutes 58 seconds
[32m[0705 13:50:41 @base.py:275][0m Start Epoch 30 ...
[32m[0705 14:20:26 @base.py:285][0m Epoch 30 (global_step 79260) finished, time:29 minutes 45 seconds.
[32m[0705 14:20:26 @saver.py:79][0m Model saved to train_log/run/model-79260.

@xudi1227
Copy link

xudi1227 commented Jul 6, 2019

Hi, may I know what kind of GPU you are using? The original work used Volta Quadro GP100

@happinesslz
Copy link
Author

@xudi1227 Thanks! My GPU is RTX2080Ti with CUDA10.0+TF1.13 or GTX1060 with CUDA9.0+TF1.12. But both of them obtain the similar results (The mAP0.25 is about 4.26% and mAP0.50 is about 1.7%.). So I don't think that the different GPUs can have such a big gap in the performance of mAP.

@f3rhoodn
Copy link

@happinesslz what is your Cudnn? I am using RTX2060 with Cuda 10.0 and TF1.13 and it does not work. with TF.1.14 it only runs on CPU and not GPU.

@happinesslz
Copy link
Author

@f3rhoodn My cudnn is cudnn-10.0-linux-x64-v7.4.2.24.tgz (The Version is v7.4.2).

@qq456cvb
Copy link
Owner

qq456cvb commented Aug 1, 2019

Hey, guys, I've updated my code so that it would run much faster now by caching training/testing data on CPU. Before this commit, all data must be read from the disk followed by some preprocessing on CPU side, which is actually a bottleneck.

@WangZhouTao
Copy link

WangZhouTao commented Aug 4, 2019

Hey,I met the same situation. I re-run the run.py file for training about 60epoch. The results seems to be very low. Have you solved this problem yet? Thank you very much.

@NUAAXQ
Copy link

NUAAXQ commented Aug 16, 2019

@happinesslz The total_cost are always nan during training. Did your meet the same problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants