grad update issue #12

neil-yc · 2021-05-11T11:28:41Z

Line 51 in c5fbd7c

proj_grads_flatten = tf.vectorized_map(proj_grad, grads_task)

In first iteration grad_i is update to proj(grad_i), but it is not updated in grads_task list, so that in next iteration, we compute proj(grad_j), grad_i in taken into account rather than proj(grad_i), is it reasonable ?

51616 · 2021-10-26T09:23:53Z

This code seems to align with pseudo code in the paper (not changing the gradient in-place). Not sure why they opt to not change the gradient in-place tho. Maybe because the fact that the last gradient in the outer loop would not be projected at all as all other gradients has been align with the last one already. This could have side effect that the model would train slower (the gradients don't go directly the way they're supposed to) but they stated in the paper that this doesn't seem to happen in practice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

grad update issue #12

grad update issue #12

neil-yc commented May 11, 2021 •

edited

Loading

51616 commented Oct 26, 2021

grad update issue #12

grad update issue #12

Comments

neil-yc commented May 11, 2021 • edited Loading

51616 commented Oct 26, 2021

neil-yc commented May 11, 2021 •

edited

Loading