You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My understanding of your code is that self.g_loss is the sum of the log probability of a word at each timestamp given previous words, and each such log probability is multiplied by its respective reward.
Based on this loss and tf.gradients op, you calculated the gradient self.g_grad
However, in your paper, the gradient is calculated in a different way. It seems to me that according to your paper the gradient is equal to the sum of the gradient of the log probability
times the reward. And your implementation seems to ignore this gradient, and use this equation as the loss, and apply gradient on this loss?
Could you please correct me if I am wrong? Thank you
The text was updated successfully, but these errors were encountered:
In your paper, the gradient is derived by this equation.
In your code you first calculated the loss, then use tf.gradient to derive the gradient:
My understanding of your code is that
self.g_loss
is the sum of the log probability of a word at each timestamp given previous words, and each such log probability is multiplied by its respective reward.Based on this loss and tf.gradients op, you calculated the gradient
self.g_grad
However, in your paper, the gradient is calculated in a different way. It seems to me that according to your paper the gradient is equal to the sum of the gradient of the log probability
times the reward. And your implementation seems to ignore this gradient, and use this equation as the loss, and apply gradient on this loss?
Could you please correct me if I am wrong? Thank you
The text was updated successfully, but these errors were encountered: