You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am reading torch implementation, your implementation and the pytorch implementation. I found that there are mask in your implementation and torch implementation, but there is no mask in pytorch implementation. Is the role of mask is to get the valid ones? If there is no mask, what will the performance and the result be like?
I am training the pytorch implementation on handwritten dataset, I found that there is a lot of repeat in the decoded result, as below shown. is is the reason that I didn't use mask in the procedure of attention operation?
I am reading torch implementation, your implementation and the pytorch implementation. I found that there are mask in your implementation and torch implementation, but there is no mask in pytorch implementation. Is the role of mask is to get the valid ones? If there is no mask, what will the performance and the result be like?
I am training the pytorch implementation on handwritten dataset, I found that there is a lot of repeat in the decoded result, as below shown. is is the reason that I didn't use mask in the procedure of attention operation?
The text was updated successfully, but these errors were encountered: