You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The training code still use the causal attention mask. You need to set full_context_alignment=True in the decoder.forward function to turn on non-causal attention mask. Is it a mistake?
The text was updated successfully, but these errors were encountered:
same question here. I'm benchmarking model performance with and without the causal mask and found that they result in a small change about ~1BLEU in final ASR-BLEU evaluation.
Just want to confirm with the author @Rongjiehuang if such a causal mask is used for results reported in the paper?
P.S. @youngsheen I saw you are the first author of DiffS2UT and I'm wondering if the code for that paper is released anywhere? I'm very interested in your approach as well.
The training code still use the causal attention mask. You need to set full_context_alignment=True in the decoder.forward function to turn on non-causal attention mask. Is it a mistake?
The text was updated successfully, but these errors were encountered: