-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reproduce the result on WMT14 En-De #202
Comments
Yes, you are right. Originally I used the Google dataset [1], but was hoping to reproduce the results with our script, because it's not clear how the Google version was preprocessed. I'm working on an updated preprocessing script that should better match the Google version (~4.5M pairs). I'll post it here and update the README shortly. |
Please try this dataset: #203 I just ran it on 128 GPUs and get the same results as (actually a little better than) the paper now. |
Thanks @myleott ! I'm running on the new dataset (with 8 GPUs), and will return to you with latest result. |
Hi @myleott , after running on 8 M40 GPUs for about 5 days, I obtain a BLEU of 28.77 on WMT14 En-De. Thanks again for the code and help! BTW, may I know that do you have a plan of giving the detailed config/command to reproduce the result on WMT14 En-Fr? Thanks! |
For En-Fr you can use the I used the standard fairseq En-Fr dataset with 40k BPE tokens, available here: https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2fr.sh. For preprocessing make sure to add the |
Thanks! |
Hi @myleott @ustctf, if I use the new processed WMT14 En-De data provided by Google, should I also do some postprocessing (like get_ende_bleu.sh in tensor2tensor) to get a good BLEU? |
hi @ustctf Can you provide the BLEU score for en-fr by using this script https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2fr.sh |
@gvskalyan Sorry I've no records. Maybe you can ask for the official help. |
Yeah, Thank You. |
Hi,
Thank you for providing such an impressive toolkit!
For replicating the WMT14 En-De translation result, I follow the instructions here , but after running on 8 M40 for 5.5 days, the test set BLEU (<27) cannot match the one stated in the paper , or even the original T2T paper (28.4). May I know what's wrong at my side? Here is the running script:
(I do not use --fp16 and slightly enlarge the batch size from 3584 to 4096)
Here is the test script:
python generate.py ${REMOTE_DATA_PATH}/wmt14_en_de_joined_dict --path ${REMOTE_MODEL_PATH}/${model}/${PROBLEM}/${SETTING}/checkpoint_best.pt --batch-size 128 --beam 4 --lenpen 0.6 --quiet --remove-bpe --no-progress-bar
It outputs (after training for 5.5 days): Generate test with beam=4: BLEU4 = 26.66, 57.9/32.3/20.4/13.2 (BP=1.000, ratio=1.013, syslen=66179, reflen=65346)
BTW, it seems the dataset generated using prepare-wmt14en2de.sh has < 4M training pairs, not matching 4.5M, is it a possible reason?
Thanks a lot.
The text was updated successfully, but these errors were encountered: