gradient decent implementation #62

o20021106 · 2019-04-19T09:11:02Z

In your paper, the gradient is derived by this equation.

In your code you first calculated the loss, then use tf.gradient to derive the gradient:

self.g_loss = -tf.reduce_sum(
            tf.reduce_sum(
                tf.one_hot(tf.to_int32(tf.reshape(self.x, [-1])), self.num_emb, 1.0, 0.0) * tf.log(
                    tf.clip_by_value(tf.reshape(self.g_predictions, [-1, self.num_emb]), 1e-20, 1.0)
                ), 1) * tf.reshape(self.rewards, [-1])
        )

g_opt = self.g_optimizer(self.learning_rate)

self.g_grad, _ = tf.clip_by_global_norm(tf.gradients(self.g_loss, self.g_params), self.grad_clip)
self.g_updates = g_opt.apply_gradients(zip(self.g_grad, self.g_params))

My understanding of your code is that self.g_loss is the sum of the log probability of a word at each timestamp given previous words, and each such log probability is multiplied by its respective reward.

Based on this loss and tf.gradients op, you calculated the gradient self.g_grad

However, in your paper, the gradient is calculated in a different way. It seems to me that according to your paper the gradient is equal to the sum of the gradient of the log probability
times the reward. And your implementation seems to ignore this gradient, and use this equation as the loss, and apply gradient on this loss?

Could you please correct me if I am wrong? Thank you

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gradient decent implementation #62

gradient decent implementation #62

o20021106 commented Apr 19, 2019

gradient decent implementation #62

gradient decent implementation #62

Comments

o20021106 commented Apr 19, 2019