-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Bug Fixed] fix batch norm when fix_gamma
is True
#18492
Conversation
Hey @wkcn , Thanks for submitting the PR
CI supported jobs: [unix-cpu, clang, unix-gpu, centos-cpu, miscellaneous, website, edge, centos-gpu, windows-cpu, windows-gpu, sanity] Note: |
Hi @sxjscience , could you please help take a review? |
@@ -228,7 +228,7 @@ class CuDNNBatchNormOp { | |||
&a, | |||
&b, | |||
&a, | |||
req[cudnnbatchnorm::kGamma] == kWriteTo ? &b: &b_add, | |||
req[cudnnbatchnorm::kGamma] == kAddTo ? &b_add : &b, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we just test with req[cudnnbatchnorm::kGamma] == kAddTo
. We won't be able to set the gradient accumulation appropriately if req[cudnnbatchnorm::kGamma] == kWriteTo && req[cudnnbatchnorm::kBeta] == kAddTo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhreshold I believe there are some grad_req = add
issues in BN. Would you know if there are cases that gamma.grad_req = write
and beta.grad_req = add
? If not, we may just raise an error in the OP if it happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any use case I can imagine, raising an exception is acceptable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sxjscience @zhreshold Thanks for your review! I will add the grad_req check and an unittest for gradient accumulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gamma.grad_req = kAddTo
is not supported in Naive CPU Implementation, Naive GPU one and MKLDNN one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it shows a larger problem of the BN layer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gradient of input, gamma and beta on CPU is wrong when grad_req
is True. The gradient of input is not accumulated. The gradient of gamma and beta are both zero.
import mxnet as mx
from mxnet.gluon import nn
N = 1
C = 3
H = W = 2
block = nn.BatchNorm()
block.collect_params().initialize()
block.collect_params().setattr('grad_req', 'add')
x = mx.nd.arange(N*C*H*W).reshape((N, C, H, W))
x.attach_grad()
for i in range(2):
with mx.autograd.record():
y = block(x)
loss = (y * y).sum()
loss.backward()
print(x.grad, block.gamma.grad(), block.beta.grad())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is somehow out-of-the-scope of this PR. How about to create an issue and we can have another PR to fix the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created an issue #18499
Close it and focus on the PR #18500 , which contains this PR. |
Description
Fix the issue #16297 #18475
When
fix_gamma
is True, the batch norm will setgrad_req
ofgamma
tonull
.CudnnBatchNorm will be invoked when
cudnn_off=False and axis=1
, and the gradient ofbeta
will be accumulated by mistake.Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
fix_gamma=True
.