Not NAR mode in training #3

youngsheen · 2023-03-30T14:32:26Z

The training code still use the causal attention mask. You need to set full_context_alignment=True in the decoder.forward function to turn on non-causal attention mask. Is it a mistake?

steventan0110 · 2024-02-15T16:52:11Z

Causal Mask used in reported results?

same question here. I'm benchmarking model performance with and without the causal mask and found that they result in a small change about ~1BLEU in final ASR-BLEU evaluation.
Just want to confirm with the author @Rongjiehuang if such a causal mask is used for results reported in the paper?

P.S. @youngsheen I saw you are the first author of DiffS2UT and I'm wondering if the code for that paper is released anywhere? I'm very interested in your approach as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not NAR mode in training #3

Not NAR mode in training #3

youngsheen commented Mar 30, 2023 •

edited

Loading

steventan0110 commented Feb 15, 2024 •

edited

Loading

Not NAR mode in training #3

Not NAR mode in training #3

Comments

youngsheen commented Mar 30, 2023 • edited Loading

steventan0110 commented Feb 15, 2024 • edited Loading

Causal Mask used in reported results?

youngsheen commented Mar 30, 2023 •

edited

Loading

steventan0110 commented Feb 15, 2024 •

edited

Loading