Skip to content

Commit

Permalink
Fix error when training multilingual_translation task with multi-GPU
Browse files Browse the repository at this point in the history
Summary:
D10052908 introduce multilingual_translation task, but it raises exception when training with multiple-GPUs: P60202593

With Myle's help, we found that it is because of improperly handled dummy batch data type, and it causes optimizer.backward() is not executed same number of times cross different GPUs.

Reviewed By: xianxl

Differential Revision: D12964263

fbshipit-source-id: 4991039030bf373f0c484e131acc4736487be4d8
  • Loading branch information
pipibjc authored and facebook-github-bot committed Nov 8, 2018
1 parent 8eb232c commit 189fcab
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions fairseq/data/round_robin_zip_datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ def __len__(self):

def collater(self, samples):
"""Merge a list of samples to form a mini-batch."""
if len(samples) == 0:
return None
if self.eval_key is None:
return OrderedDict([
(key, dataset.collater([sample[key] for sample in samples]))
Expand Down

0 comments on commit 189fcab

Please sign in to comment.