Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

Bug in decoder_padding_mask in BPE training #242

Open
csukuangfj opened this issue Aug 3, 2021 · 0 comments
Open

Bug in decoder_padding_mask in BPE training #242

csukuangfj opened this issue Aug 3, 2021 · 0 comments

Comments

@csukuangfj
Copy link
Collaborator

See the code below:

eos_id = self.decoder_num_class - 1

ys_in_pad = pad_list(ys_in, eos_id)

tgt_key_padding_mask = decoder_padding_mask(ys_in_pad)

def decoder_padding_mask(ys_pad: Tensor, ignore_id: int = -1) -> Tensor:

ys_mask = ys_pad == ignore_id
return ys_mask


You can see that ys_in_pad is padded with eos_id, which is a positive word piece ID.

However, it is using -1 to compute the mask for ys_in_pad.


This bug may explain why the WERs differ with respect to batch size. It also affects the training, I guess.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant