Skip to content
This repository has been archived by the owner on Sep 27, 2020. It is now read-only.

Peephole connections (Wci, Wcf, Wco) gradient update #20

Open
Pozimek opened this issue Oct 31, 2019 · 1 comment
Open

Peephole connections (Wci, Wcf, Wco) gradient update #20

Pozimek opened this issue Oct 31, 2019 · 1 comment

Comments

@Pozimek
Copy link

Pozimek commented Oct 31, 2019

The LSTM paper defines a specific rule for gradient updates of the 'peephole' connections. Specifically:

[...] during learning no error signals are propagated back from gates via peephole connections to CEC

Based on my understanding of the code the way these 3 variables are initialized (as asked in Issue 17) is an attempt at implementing this update rule, but I don't see how does initializing them as Variables helps. From my understanding of the quoted part of the LSTM paper, the peephole connections should be updated but the gradient that updates them should stop there and not flow any further. If that is the case then this implementation is incorrect, although it might be that Pytorch does not support such an operation as .detach() is not suitable for the job.

@Pozimek
Copy link
Author

Pozimek commented Nov 1, 2019

I've come to think that changing L33, L34 and L36 to use c.detach() should fix this issue, but I'm not very confident about this.

ci = torch.sigmoid(self.Wxi(x) + self.Whi(h) + c.detach() * self.Wci)

IMO gradient should flow through c only via the operations in L35 and L37.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant