You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about the pseudo code of the LSDA which was implemented with only ten lines of code, and only reshape
and permute operations are used:
if type == "SDA":
x = x.reshaspe(H // G, G, W // G, G, D).permute(0, 2, 1, 3, 4)
elif type == "LDA":
x = x.reshaspe(G, H // G, G, W // G, D).permute(1, 3, 0, 2, 4)
Although they do have difference in the way of reshaping, I still have question about the reason for this special design. Can you explain from another perspective why these two different design results correspond to different attention (Long or Short ) implementations?
Thanks a lot !
The text was updated successfully, but these errors were encountered:
Thanks for your excellent work!
I have a question about the pseudo code of the LSDA which was implemented with only ten lines of code, and only reshape
and permute operations are used:
if type == "SDA":
x = x.reshaspe(H // G, G, W // G, G, D).permute(0, 2, 1, 3, 4)
elif type == "LDA":
x = x.reshaspe(G, H // G, G, W // G, D).permute(1, 3, 0, 2, 4)
Although they do have difference in the way of reshaping, I still have question about the reason for this special design. Can you explain from another perspective why these two different design results correspond to different attention (Long or Short ) implementations?
Thanks a lot !
The text was updated successfully, but these errors were encountered: