We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
你好,褚博士 为什么hidden需要grad,下一个seq只需要hidden中的值,不需要hidden的梯度啊
The text was updated successfully, but these errors were encountered:
这个在训练的时候主要是为了防止sequence太长,back prop距离太远,内存会不够用。如果在hidden位置截断gradient就不会一路back prop回去了。
Sorry, something went wrong.
我的意思是detach就可以截断grad反向传播到上个序列,即使hidden的requires_grad=False。这里为什么要指定requires_grad=True?
我认为这里的参数requires_grad=True是多余的,hidden不需要grad。
No branches or pull requests
你好,褚博士
为什么hidden需要grad,下一个seq只需要hidden中的值,不需要hidden的梯度啊
The text was updated successfully, but these errors were encountered: