-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functionality difference between pytorch batchnorm and synchronised batchnorm #16
Comments
Hi, thanks for your interest. BTW, could you specify where is the difference "taking the absolute value of the weights and adding eps to it"? |
Hi Juntang, I was referring to this part of the code in inplace_abn_cuda.cu, line 114: template T _mean = mean[plane]; T mul = rsqrt(_var + eps) * _weight; for (int batch = 0; batch < num; ++batch) { Here, This is quite different from pytorch's batchnorm implementation where we use the weight without any modification. While using pytorch's batchnorm, if I perform the exact same operation to the weights before calling batchnorm or modify the weights while loading, I am able to replicate the accuracy. Regards - Debapriya |
@debapriyamaji ,I want to run the inference of the model on cpu,and may be replacing inplace abn batchnorm to torch batchnorm is the way to go.Could you please share your modified script,so that it will help me to run on CPU.Also did you try to run the model on cpu? |
Hi,
Thanks a lot for sharing the code.
I wanted to export an ONNX model. Hence, I replaced all the synchronized batchnorm with pytorch's batch-norm. However, I observed huge drop in accuracy(~20%). When I dig deeper, I realized that inside the batch-norm kernel, you are taking the absolute value of the weights and adding eps to it. This is functionally different from pytorch's batch-norm.
What is the reason behind this slightly different implementation of batch-norm? Does it help in training or something else?
The text was updated successfully, but these errors were encountered: