-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baseline results are far off from Lua version #21
Comments
I'm not sure if that's related. But I noticed that each evaluation step (that by default happens every 5h) - after model restoration - throws loss somewhat back and then it requires ~10k-15k steps to get to the same numbers as before evaluation. |
Thanks for the comparison. Do you confirm that you compared RNN with BRNN? I already identified some improvements to do both for speed and performance. How does the transformer result compare with other implementations? |
Yes it is rnn vs brnn but the rnn version of Lua is slightly off vs its rnn. I just took brnn by mistake but no major change. |
Just one strange thing. |
For Fr->En I get 22.5 BLEU after 500k steps and batch size 32, and 19.5 BLEU after 100k steps and batch size = 58. |
Thanks for testing. I recently pushed this commit that could have a non negligible impact on the training. With the master version, I easily reached about 37 BLEU with the Transformer on the 1M ENFR baseline (by the way, major speedup for this model is coming very soon!)
This could be a real issue with some dataset and is covered in the TensorFlow documentation:
That means if So either:
Both cases are not convenient and a simple misconfiguration can significantly impact the training. This should be revised, maybe by enforcing the evaluation delay. |
Yeah, I incorporated it in the middle of 500k training, and 100+k is already using it.
I use WMT14 ENFR corpus for FR->EN, but results are not as promising so far.
Yeah, I made a stupid thing, thanks for pointing that out! ). But for 100k+ training I decreased that number to 10h, and I just returned it to 5h. So we'll see what happens.
I believe it strongly depends on GPU. If you use TitanX or 1080ti, it might be 1.5-2 times faster than my 1070. Thank you for your fast reply and great insights into the config settings! I'll try them out. |
Great news, would it involve training or inference phase, or both? |
Both. I'm preparing the PR. |
The models became much faster indeed. Training - 1.5 times faster, and inference "a hell lot faster", I don't know like 10 times or maybe even more. At the same time. After 220k steps my Fr->En translator gives 21.34 BLEU score, which seems a bit too small. All the settings seem to be default, maybe except for tokenization. Though I don't think there's an issue there. Any thoughts/ideas on what might be wrong? I don't think that Fr->En is so much more complicated than En->Fr. I'm not sure if BPE gives a big accuracy boost... |
What configuration(s) are you actually using? In particular, do you use You should get good results by now. |
Here are my configs: config/opennmt-transf.yml
config/optim/adam_with_noam_decay.yml
config/data/enfr.yml
command to run:
|
The configurations look good. I'm starting a comparative run on the same dataset and will check. |
Great! I'm looking forward hearing about your results. |
Regarding the initial issue, I recently pushed some commits that make the implementation and training even closer to the Lua version. On the 1M ENFR baseline, I obtain 36.73 BLEU after 200k steps with a brnn 2x500 and the configuration below: model_dir: run/baseline-enfr-rnn
data:
train_features_file: /training/Users/klein/baseline-1M-enfr/baseline-1M_train.en.light_tok
train_labels_file: /training/Users/klein/baseline-1M-enfr/baseline-1M_train.fr.light_tok
eval_features_file: /training/Users/klein/baseline-1M-enfr/baseline-1M_test.en.light_tok
eval_labels_file: /training/Users/klein/baseline-1M-enfr/baseline-1M_test.fr.light_tok
source_words_vocabulary: /training/Users/klein/baseline-1M-enfr/en-vocab.txt
target_words_vocabulary: /training/Users/klein/baseline-1M-enfr/fr-vocab.txt
params:
optimizer: GradientDescentOptimizer
learning_rate: 1.0
clip_gradients: 5.0
param_init: 0.1
decay_type: exponential_decay
decay_rate: 0.7
decay_steps: 20000
start_decay_steps: 140000
beam_width: 5
maximum_iterations: 250
train:
batch_size: 64
save_checkpoints_steps: 5000
save_summary_steps: 200
train_steps: 200000
eval_delay: 7200 # Every 2 hours.
maximum_features_length: 50
maximum_labels_length: 50
save_eval_predictions: true
external_evaluators: BLEU
infer:
batch_size: 30 I would now call that on par with the Lua version, so I'm closing this issue. @gsoul Regarding the giga-fren train, an important aspect appears to be proper data shuffling. Manually shuffling the data before the training is a good start to increase the learning efficiency. Otherwise, I introduced a new My Transformer training on this dataset is now at 25.70 BLEU after 380k steps. Does anyone know what score we should expect on However, the Transformer is another piece of work. As @vince62s shared on the tensor2tensor repository, the model is sensitive to hyper-parameters in particular batch size. In any cases, let's keep this out of the comparison with the Lua version and maybe open a new issue to track the tuning of this model. |
@guillaumekln I redid my comparison it is not so good.
Baseline 1M enfr:
Lua (brnn 2x500): BLEU on test set without replace_unk = 35.65 (beam1) 37.00 (beam5)
62 min per epoch on a GTX1080
TF (2x500 rnn) optim sgd: BLEU 26.47 after 200k steps
1h24 per 15K steps (approx 1 epoch) on a GTX1080ti
TF (2x500 rnn) optim noam: BLEU 28.87 after 200k steps
TF (transformer) optim noam: BLEU 33.23 after 100K steps
The text was updated successfully, but these errors were encountered: