Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gradient decent implementation #62

Open
o20021106 opened this issue Apr 19, 2019 · 0 comments
Open

gradient decent implementation #62

o20021106 opened this issue Apr 19, 2019 · 0 comments

Comments

@o20021106
Copy link

In your paper, the gradient is derived by this equation.

image

In your code you first calculated the loss, then use tf.gradient to derive the gradient:

self.g_loss = -tf.reduce_sum(
            tf.reduce_sum(
                tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log(
                    tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0)
                ), 1) * tf.reshape(self.rewards, [-1])
        )

g_opt = self.g_optimizer(self.learning_rate)

self.g_grad, _ = tf.clip_by_global_norm(tf.gradients(self.g_loss, self.g_params), self.grad_clip)
self.g_updates = g_opt.apply_gradients(zip(self.g_grad, self.g_params))

My understanding of your code is that self.g_loss is the sum of the log probability of a word at each timestamp given previous words, and each such log probability is multiplied by its respective reward.

Based on this loss and tf.gradients op, you calculated the gradient self.g_grad

However, in your paper, the gradient is calculated in a different way. It seems to me that according to your paper the gradient is equal to the sum of the gradient of the log probability
times the reward. And your implementation seems to ignore this gradient, and use this equation as the loss, and apply gradient on this loss?

Could you please correct me if I am wrong? Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant