You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello authors, your work is absolutely amazing! Thank you so much for making the results and the code publicly available.
However, I have found a small issue with the EfficientAdditiveAttention module when I was migrating it to my research content:
classEfficientAdditiveAttnetion(nn.Module):
""" Efficient Additive Attention module for SwiftFormer. Input: tensor in shape [B, N, D] Output: tensor in shape [B, N, D] """def__init__(self, in_dims=20, token_dim=768, num_heads=2):
super().__init__()
self.to_query=nn.Linear(in_dims, token_dim*num_heads)
self.to_key=nn.Linear(in_dims, token_dim*num_heads)
...
If the dimension of the input tensor is BxNxD, denoted by batch, in_dims, and token_dim, respectively, then should the input dimension of the linear transformation layer be token_dim? Otherwise, the operation cannot be performed.
Hello authors, your work is absolutely amazing! Thank you so much for making the results and the code publicly available.
However, I have found a small issue with the EfficientAdditiveAttention module when I was migrating it to my research content:
If the dimension of the input tensor is BxNxD, denoted by batch, in_dims, and token_dim, respectively, then should the input dimension of the linear transformation layer be
token_dim
? Otherwise, the operation cannot be performed.I did a little test as follows:
After executing the above code, the interpreter reports the following error:
Am I misunderstanding something? I hope you can help me out.🙏
The text was updated successfully, but these errors were encountered: