Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question regards to loss calculation #2

Closed
kwonmha opened this issue Aug 2, 2017 · 7 comments
Closed

question regards to loss calculation #2

kwonmha opened this issue Aug 2, 2017 · 7 comments

Comments

@kwonmha
Copy link

kwonmha commented Aug 2, 2017

For augmented model, you added alpha(0.5) * temperature(10) * augmented_loss to ordinary loss.
How did you choose alpha and temperature?
And why did you inserted multiplying temperature to augmented loss?
Because it's not shown in the paper.
And have you tested using only augmented loss without adding it to ordinary loss?
I think it's not explicitly mentioned in paper.
TY

@icoxfog417
Copy link
Owner

icoxfog417 commented Aug 2, 2017

Augmented loss is calculated as follows (p2, formula 3.3).

image

So we have to decide the parameter α. The author describes the way to compute α in the Appendix(p12).

image

α=γτ. I implement this at here.

@kwonmha
Copy link
Author

kwonmha commented Aug 2, 2017

Thank you for answer.
Maybe I've overlooked the appendix part.

@kwonmha
Copy link
Author

kwonmha commented Aug 2, 2017

Sorry but I have another question to make certain.
How should I get output of the network for validating or testing?
Is it softmax( (Wh+b)/t )?

And I think you didn't divide Wh+b by temperature when calculating cross entropy like eq(3.1) on paper.

@kwonmha
Copy link
Author

kwonmha commented Aug 3, 2017

You can ignore last 2 sentences of my previous comment.
I got confused.
Regards to the output of the network for validating or testing, it can be softmax(Wh+b), right?

@icoxfog417
Copy link
Owner

I think softmax( (Wh+b)/t ) is used to calculate augmented loss only.
So network output will be softmax(Wh+b).

@kwonmha
Copy link
Author

kwonmha commented Aug 5, 2017

I think so too.
Thank you!

@kwonmha
Copy link
Author

kwonmha commented Aug 11, 2017

It's a kind of glitch, but in formulation.png, it looks like you used softmax(Wh/t) to calculate both CE and KL.
Which may be not the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants