-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect topological parsing with memory-layer referencing. #2061
Comments
I also tried adding one line of code (as suggested by qingqing01 and jacquesqiao):
But still the same error. |
I encounter the same problem for scheduled sampling.
paddle.layer.max_id is only used to update the memory and not appears in the topological graph. In order to fix this problem, I tried to use the softmax output as the memory layer and then use max_id to extract the generated word. Since the softmax output is used for calculating the cost function, it will appear in the final topological graph. Such method works. |
It seems that PaddlePaddle V2 APIs only consider the explicit layer connection (by "input" argument) when parsing the network topology, neglect of the fact that the memory-layer referencing (by "name" argument of paddle.layer.memory) should also be considered as an implicitly connection. As a result, such a layer with its output only referenced by a memory layer and not explicitly connected to any final cost/output layer, will not be created at all during backward traversing the topological graph.
Here is a simple example:
With error:
I think it is due to that the memory_update layer is not created at all, and then PaddlePaddle cannot find any layer matching the name "memory" in the created last_memory layer. The reason might be that the memory_update layer is not explicitly connected to the cost layer, misleading PaddlePaddle to ignore it when creating layers.
However, it is actually connected (in a indirect or implicit manner) to the cost layer in the next time step through paddle.layer.memory component, and of-course, should never be ignored.
I guess, any recurrent model with a cost layer depending on the previous-step memory rather than current-step memory (updated just now) will meet the same problem (because the current-step update memory layer will then have no connection to the cost layer within current time step).
To prove it, I change only a single line of the code, making the cost layer depend on the current-step memory instead of the previous-step memory in original code, and then the model works just well.
I change last_memory to memory_update as below (such that memory_update is explicitly connected to the final cost), and the code works just well.
From
to
Neural Turing Machine model with "read first and write next" (not reverse) will also have such a problem. However, demos like vanilla LSTM/ GRU will not run into the problem since their cost or softmax output distribution depends LUCKILY on updated memory (hidden state, or cell state), instead of previous memory.
Besides, such a problem didn't exist in V1 APIs.
Would it be a bug? Could anyone help solve this issue?
The text was updated successfully, but these errors were encountered: