-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
confused about "SampledSoftmaxLoss" func #88
Comments
Hi, I also have same question. But I did some debugging on the training code provided by the author for the public dataset, and below is my analysis of this loss function :
So those are my personal understanding, there may be some mistakes. Discussions are welcome, and it would be better if the authors could provide official explanations! |
Hi, thanks for your interest in our work and for @Blank-z0's explanations! 1-4/ are correct. To elaborate a bit more on 3/ - we abstract out similarity function computations in this codebase, in order to support alternative learned similarity functions like FMs, MoL, etc. besides dot products in a unified API. The experiments reported in the ICML paper were all conducted with dot products / cosine similarity to simplify discussions. Further references/discussions for learned similarities can be found in Revisiting Neural Retrieval on Accelerators, KDD'23, with follow up work by LinkedIn folks in LiNR: Model Based Neural Retrieval on GPUs at LinkedIn, CIKM'24; we've also provided experiment results that integrate HSTU and MoL in Efficient Retrieval with Learned Similarities (but this paper is more about theoretical justifications for using learned similarities). |
Hey, Congratulations for your perfect and creative work.
when I read the implementation code here, I am very confused about SampledSoftmaxLoss.
I have some questions for this:
Please give me some advice if you are free, thanks~
The text was updated successfully, but these errors were encountered: