You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Only the first token is used for classification.
Please refer to the paper.
I believe in the paper they use average polling.
Of course using only 1st token still might provide you with great results, but using information from all 16 tokens should be better
In 228, why do you use only first token for classification?
The text was updated successfully, but these errors were encountered: