Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于init_hidden(self, bsz, requires_grad=True) 的问题 #4

Open
SuMarsss opened this issue Dec 18, 2019 · 3 comments
Open

关于init_hidden(self, bsz, requires_grad=True) 的问题 #4

SuMarsss opened this issue Dec 18, 2019 · 3 comments

Comments

@SuMarsss
Copy link

SuMarsss commented Dec 18, 2019

你好,褚博士
为什么hidden需要grad,下一个seq只需要hidden中的值,不需要hidden的梯度啊

@ZeweiChu
Copy link
Owner

这个在训练的时候主要是为了防止sequence太长,back prop距离太远,内存会不够用。如果在hidden位置截断gradient就不会一路back prop回去了。

@SuMarsss
Copy link
Author

我的意思是detach就可以截断grad反向传播到上个序列,即使hidden的requires_grad=False。这里为什么要指定requires_grad=True?

@SuMarsss
Copy link
Author

SuMarsss commented Dec 30, 2019

我认为这里的参数requires_grad=True是多余的,hidden不需要grad。

  1. requires_grad=True 不能起到阻止back prop的作用,detach才能起到阻止back prop的作用。训练时,将hidden的requires_grad设置为False,代码依然可以正确运行。
  2. 从整段代码上来看,hidden不是权重,应该不需要grad,torch里应该只有权重tensor才需要设置requires_grad=True。
  3. 如果需要back prop追踪到第一个seq的初始hidden,也只需要loss.backward(retain_graph=True),并不需要设置hidden的requires_grad=True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants