-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow an arbitrary mask to be used in the self attention #8235
Conversation
Signed-off-by: Lucas Robinet <[email protected]>
Signed-off-by: Lucas Robinet <[email protected]>
Signed-off-by: Lucas Robinet <[email protected]>
I think this is fine with the minor proposed change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, looks good to me.
Co-authored-by: Eric Kerfoot <[email protected]> Signed-off-by: Lucas Robinet <[email protected]>
Signed-off-by: Lucas Robinet <[email protected]>
/build |
It seems there is a TorchScript conversion issue caused by this addition.
|
It seems to be due to a typing error on the
This typing method is reserved for python versions >3.10, but it seems that python 3.9 is being used in the test environment.
I used this notation because I thought I'd already seen it in MONAI. from typing import Optional
...
attn_mask: Optional[torch.tensor] = None I can change, as you prefer! |
Yes, could you help convert this to the older typing syntax, as TorchScript does not support the |
Signed-off-by: Lucas Robinet <[email protected]>
Head branch was pushed to by a user without write access
Signed-off-by: Lucas Robinet <[email protected]>
/build |
Description
The aim of this PR is to enable the use of an arbitrary mask in the self attention module, which is very useful in the case of missing data or masked modeling.
Official torch implementations allow the use of an arbitrary mask, and in MONAI the use of a mask is also made possible with the
causal
argument. Here, it's just a generalization directly in the forward pass.In the
SABlock
andTransformerBlock
, it is now possible to input a boolean mask of size(BS, Seq_length)
.Only the columns of the masked token are set to
-inf
and not the rows, as is rarely the case in common implementations. Masked tokens don't contribute to the gradient anyway.In cases where causal attention is required, inputting a mask is not supported to avoid masks overlapping.
I haven't implemented the addition mask to the attention matrix, which allows you to use values other than
-inf
in certain cases, as may be the case here: https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.htmlIf you think it's relevant, it could be added.
Types of changes
./runtests.sh -f -u --net --coverage
../runtests.sh --quick --unittests --disttests
.make html
command in thedocs/
folder.