-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dimensionality of heads argument to SABlock #7664
Conversation
Hi @NabJa thanks for the contribution. You'll have to fix some of the issues that you're seeing but some others may be related to our testing system. Please follow the DCO instructions for doing a remediation commit, and please run |
Hi @NabJa, I guess it's expected behavior. Increasing the num_heads in the self-attention block (SABlock) does not increase the number of trainable parameters. But @ericspod and @marksgraham might have more expertise on this. What are your thoughts? |
Hi @KumoLiu , thank you for the references! Indeed, the official PyTorch implementation splits the embeddings across all heads resulting in a head dimension of |
We need to be careful not to break backwards compatibility here so the default behaviour stays the same. If that happens then I don't see anything that would affect the generative code merge |
Signed-off-by: NabJa <[email protected]> DCO Remediation Commit for NabJa <[email protected]> I, NabJa <[email protected]>, hereby add my Signed-off-by to this commit: 139182e Signed-off-by: NabJa <[email protected]>
…d-sablock-head-dim
@marksgraham complete backward compatibility should be guaranteed with 1ccb5de . |
Would it be possible to hold off merging this for a couple of days? I'm working on some changes for self-attention/cross-attention for the MONAI Generative merge and it would be good to get a little bit further into them so I can work out if i need to make any further changes to accommodate this PR. |
I think we're good to delay until you're set, thanks. |
Signed-off-by: NabJa <[email protected]>
…d-sablock-head-dim
Signed-off-by: NabJa <[email protected]>
/build |
Fixes #7661.
Description
The changes made add a parameter (dim_head) to set the output paramters of all the heads in the Self-attention Block (SABlock). Currently the output dimension is set to be hidden_size and when increasing the number of heads this is equally distributed among all heads.
Example
The original implementation automatically determines equally_distributed_head_dim:
(qkv * num_heds * equally_distributed_head_dim = 3*hidden_size
in this example -> 3 * 8 * 16 = 384)
The propesed implementation fixes this by setting the new argument dim_head:
Types of changes
./runtests.sh -f -u --net --coverage
../runtests.sh --quick --unittests --disttests
.make html
command in thedocs/
folder.