Bug in decoder_padding_mask in BPE training #242

csukuangfj · 2021-08-03T02:40:09Z

See the code below:

Line 162 in 3502531

eos_id = self.decoder_num_class - 1

Line 167 in 3502531

ys_in_pad = pad_list(ys_in, eos_id)

Line 179 in 3502531

tgt_key_padding_mask = decoder_padding_mask(ys_in_pad)

Line 709 in 3502531

def decoder_padding_mask(ys_pad: Tensor, ignore_id: int = -1) -> Tensor:

Lines 720 to 721 in 3502531

    
           ys_mask = ys_pad == ignore_id 
        
           return ys_mask

You can see that ys_in_pad is padded with eos_id, which is a positive word piece ID.

However, it is using -1 to compute the mask for ys_in_pad.

This bug may explain why the WERs differ with respect to batch size. It also affects the training, I guess.

The text was updated successfully, but these errors were encountered:

csukuangfj mentioned this issue Aug 3, 2021

Refactoring k2-fsa/icefall#4

Merged

1 task

Provide feedback