Reasoning behind constraint in `grads_and_grad_moms` #1

aomader · 2017-02-12T13:25:59Z

Hej,

I'm wondering about the reasoning behind this constraint in grads_and_grad_moms.
The given variables are indeed used in multiple operations in the loss computation graph, nonetheless that shouldn't hinder it to compute the grad.

Could you shed some light on this?

The text was updated successfully, but these errors were encountered:

lballes · 2017-02-13T16:52:31Z

Hey!

The second moment of gradients, which is needed for the gradient variance computation, is computed implicitly using a "trick" explained in this note. This trick is not directly applicable when a variable has more than one incoming gradients during back-propagation, which is the case whenever a variable is consumed by more than one operation.

Since we currently don't have an efficient way to compute the gradient moment in such a case, we ruled out multiple consumer operations. We are trying to come up with a better implementation of the gradient moment computation; any ideas/help would be highly appreciated :)

aomader · 2017-02-14T10:25:54Z

@lballes I would say the most straight forward way would be to fall back to an inefficient method to compute the second moment, but warnings.warn() the user that the trick doesn't apply to the supplied objective function.

hughsalimbeni · 2017-07-27T09:27:18Z

Hi, have you tried a running average estimate like in ADAM:

m_t <- beta1 * m_{t-1} + (1 - beta1) * g
v_t <- beta2 * v_{t-1} + (1 - beta2) * g * g

(taken from here)

The requirement for all operations to be matmul, add and conv2d seems rather restrictive

lballes · 2017-07-27T11:32:36Z

Hi @hughsalimbeni, we did some general experiments comparing running average estimates to mini-batch estimates and found the former to be quite poor. That's why we went with the mini-batch estimate for this implementation, which is of course very restrictive for practical use in its current implementation. It would definitely be worth exploring how the line search would actually perform using the running average estimates. I don't know whether I personally will find time to do that anytime soon, though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reasoning behind constraint in `grads_and_grad_moms` #1

Reasoning behind constraint in `grads_and_grad_moms` #1

aomader commented Feb 12, 2017

lballes commented Feb 13, 2017

aomader commented Feb 14, 2017 •

edited

Loading

hughsalimbeni commented Jul 27, 2017

lballes commented Jul 27, 2017 •

edited

Loading

Reasoning behind constraint in grads_and_grad_moms #1

Reasoning behind constraint in grads_and_grad_moms #1

Comments

aomader commented Feb 12, 2017

lballes commented Feb 13, 2017

aomader commented Feb 14, 2017 • edited Loading

hughsalimbeni commented Jul 27, 2017

lballes commented Jul 27, 2017 • edited Loading

Reasoning behind constraint in `grads_and_grad_moms` #1

Reasoning behind constraint in `grads_and_grad_moms` #1

aomader commented Feb 14, 2017 •

edited

Loading

lballes commented Jul 27, 2017 •

edited

Loading