Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functionality difference between pytorch batchnorm and synchronised batchnorm #16

Open
debapriyamaji opened this issue Apr 14, 2020 · 3 comments

Comments

@debapriyamaji
Copy link

debapriyamaji commented Apr 14, 2020

Hi,
Thanks a lot for sharing the code.
I wanted to export an ONNX model. Hence, I replaced all the synchronized batchnorm with pytorch's batch-norm. However, I observed huge drop in accuracy(~20%). When I dig deeper, I realized that inside the batch-norm kernel, you are taking the absolute value of the weights and adding eps to it. This is functionally different from pytorch's batch-norm.

What is the reason behind this slightly different implementation of batch-norm? Does it help in training or something else?

@juntang-zhuang
Copy link
Owner

juntang-zhuang commented Apr 14, 2020

Hi, thanks for your interest.
The bn layer is forked from other repos, please see readme for each branch. I used two versions of syncbn from two repos, syncbn in "citys" branch is from https://github.com/zhanghang1989/PyTorch-Encoding, syncbn in "citys-lw" is from "https://github.com/CoinCheung/BiSeNet". I'm not sure about the details, I guess it's implemented according to "https://hangzhang.org/PyTorch-Encoding/notes/syncbn.html", maybe you can ask questions in the original repo using syncbn.

BTW, could you specify where is the difference "taking the absolute value of the weights and adding eps to it"?
My guess is sign(w) * (abs(w) + eps) is numerically stable than w+eps . When w is negative (assuming w can be negative, with leaky-relu for example), w+eps could push w closer to 0, or even change the sign.

@debapriyamaji
Copy link
Author

debapriyamaji commented Apr 15, 2020

Hi Juntang,
Thanks for the quick response.

I was referring to this part of the code in inplace_abn_cuda.cu, line 114:

template
global void forward_kernel(T *x, const T *mean, const T *var, const T *weight, const T *bias,
bool affine, float eps, int num, int chn, int sp) {
int plane = blockIdx.x;

T _mean = mean[plane];
T _var = var[plane];
T _weight = affine ? abs(weight[plane]) + eps : T(1);
T _bias = affine ? bias[plane] : T(0);

T mul = rsqrt(_var + eps) * _weight;

for (int batch = 0; batch < num; ++batch) {
for (int n = threadIdx.x; n < sp; n += blockDim.x) {
T _x = x[(batch * chn + plane) * sp + n];
T _y = (_x - _mean) * mul + _bias;
x[(batch * chn + plane) * sp + n] = _y;
}
}
}*

Here,
T _weight = affine ? abs(weight[plane]) + eps : T(1);

This is quite different from pytorch's batchnorm implementation where we use the weight without any modification.

While using pytorch's batchnorm, if I perform the exact same operation to the weights before calling batchnorm or modify the weights while loading, I am able to replicate the accuracy.

Regards - Debapriya

@poornimajd
Copy link

poornimajd commented Sep 4, 2020

@debapriyamaji ,I want to run the inference of the model on cpu,and may be replacing inplace abn batchnorm to torch batchnorm is the way to go.Could you please share your modified script,so that it will help me to run on CPU.Also did you try to run the model on cpu?
Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants