-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot use --init_model_path #1738
Comments
@KeepLearning12138 Please refer to here as how to run PaddlePaddle: https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst |
@livc @helinwang Thanks a lot for your reply. According to your comments, I installed docker-image on a isolate server, run the demo code shown in ch.7 of the paddle-book. (i.e., train.py) and get the following results: Pass 0, Batch 0, Cost 164.959180, {'classification_error_evaluator': 1.0} Can we directly use "train.py"? Best Wishes |
@KeepLearning12138 Sure, please see the link that I posted in the previous comment: https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst. The specific line is at: docker run --rm -v ~/workspace:/workspace paddlepaddle/paddle:0.10.0rc2 python /workspace/train.py The above comment mounts |
@helinwang Thanks for the quick reply. Actually, I follow your suggestion. What i mean is that: although I can run the "train.py", my code is easily trapped in "NaN". When i set the learning rate as "0.0", the NaN disappears. I just want to confirm where the error comes from. |
Good to know that you are experimenting around! However setting learning rate to 0.0 the network will stop to learn anything :) |
Thanks for the reply. |
@KeepLearning12138 after #1750, you can use
instead of
to do a warm start. |
Still not worked yet. Many NaN are detected. Only if i set batch_size as 1, the NaN seems disappearing. The test is OK, but training does not work. The command line is: And the log is: [==================================================] Besides, I tried some examples about sentiment classification and I can get reasonable results. |
I'm having similar issue with cost going to |
@alvations How about trying to reduce the learning rate? |
I've tried the older setup which used to work in v0.8.0 and v0.9.0 with the v0.10.0 code
The current default from https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/train.py#L143 (as below) is frustratingly slow.
What are the changes to the optimizer such that the old settings doesn't work any more? @livc I've also tried lowering the learning rate but at some point the cost also goes to a NaN and the training breaks =( Is it because the gradient clipping is made global at 2e4c0bd ? |
gradient clipping has been fixed in the current develop branch. And we also fix a terrible bug of sequnece_softmax. The NMT training in 0.10.0 has been fixed now. I close this issue due to inactivity, please feel free to reopen it. |
I try to use the NMT model you trained (fr->en).
When i use --init_model_path=<your_pretrained_model>, I get the following error:
I0401 16:41:14.938364 3162 TrainerInternal.cpp:165] Batch=2 samples=100 AvgCost=nan CurrentCost=nan Eval: classification_error_evaluator=0.696074 CurrentEval: classification_error_evaluator=1
The command line is
paddle train
--config='translation/train.conf'
--save_dir='translation/model/wmt14_model'
--use_gpu=1
--num_passes=16
--show_parameter_stats_period=1
--trainer_count=1
--log_period=1
--dot_period=5
--init_model_path=model/wmt14_model/pass-00012
2>&1 | tee 'translation/train.log'
Can any one help me?
Thanks
The text was updated successfully, but these errors were encountered: