-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question regards to loss calculation #2
Comments
Augmented loss is calculated as follows (p2, formula 3.3). So we have to decide the parameter α. The author describes the way to compute α in the Appendix(p12). α=γτ. I implement this at here. |
Thank you for answer. |
Sorry but I have another question to make certain. And I think you didn't divide Wh+b by temperature when calculating cross entropy like eq(3.1) on paper. |
You can ignore last 2 sentences of my previous comment. |
I think softmax( (Wh+b)/t ) is used to calculate augmented loss only. |
I think so too. |
It's a kind of glitch, but in formulation.png, it looks like you used softmax(Wh/t) to calculate both CE and KL. |
For augmented model, you added alpha(0.5) * temperature(10) * augmented_loss to ordinary loss.
How did you choose alpha and temperature?
And why did you inserted multiplying temperature to augmented loss?
Because it's not shown in the paper.
And have you tested using only augmented loss without adding it to ordinary loss?
I think it's not explicitly mentioned in paper.
TY
The text was updated successfully, but these errors were encountered: