Peephole connections (Wci, Wcf, Wco) gradient update #20

Pozimek · 2019-10-31T17:12:15Z

The LSTM paper defines a specific rule for gradient updates of the 'peephole' connections. Specifically:

[...] during learning no error signals are propagated back from gates via peephole connections to CEC

Based on my understanding of the code the way these 3 variables are initialized (as asked in Issue 17) is an attempt at implementing this update rule, but I don't see how does initializing them as Variables helps. From my understanding of the quoted part of the LSTM paper, the peephole connections should be updated but the gradient that updates them should stop there and not flow any further. If that is the case then this implementation is incorrect, although it might be that Pytorch does not support such an operation as .detach() is not suitable for the job.

Pozimek · 2019-11-01T15:34:58Z

I've come to think that changing L33, L34 and L36 to use c.detach() should fix this issue, but I'm not very confident about this.

ci = torch.sigmoid(self.Wxi(x) + self.Whi(h) + c.detach() * self.Wci)

IMO gradient should flow through c only via the operations in L35 and L37.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Peephole connections (Wci, Wcf, Wco) gradient update #20

Peephole connections (Wci, Wcf, Wco) gradient update #20

Pozimek commented Oct 31, 2019

Pozimek commented Nov 1, 2019

Peephole connections (Wci, Wcf, Wco) gradient update #20

Peephole connections (Wci, Wcf, Wco) gradient update #20

Comments

Pozimek commented Oct 31, 2019

Pozimek commented Nov 1, 2019