Functionality difference between pytorch batchnorm and synchronised batchnorm #16

debapriyamaji · 2020-04-14T06:42:21Z

Hi,
Thanks a lot for sharing the code.
I wanted to export an ONNX model. Hence, I replaced all the synchronized batchnorm with pytorch's batch-norm. However, I observed huge drop in accuracy(~20%). When I dig deeper, I realized that inside the batch-norm kernel, you are taking the absolute value of the weights and adding eps to it. This is functionally different from pytorch's batch-norm.

What is the reason behind this slightly different implementation of batch-norm? Does it help in training or something else?

juntang-zhuang · 2020-04-14T20:46:27Z

Hi, thanks for your interest.
The bn layer is forked from other repos, please see readme for each branch. I used two versions of syncbn from two repos, syncbn in "citys" branch is from https://github.com/zhanghang1989/PyTorch-Encoding, syncbn in "citys-lw" is from "https://github.com/CoinCheung/BiSeNet". I'm not sure about the details, I guess it's implemented according to "https://hangzhang.org/PyTorch-Encoding/notes/syncbn.html", maybe you can ask questions in the original repo using syncbn.

BTW, could you specify where is the difference "taking the absolute value of the weights and adding eps to it"?
My guess is sign(w) * (abs(w) + eps) is numerically stable than w+eps . When w is negative (assuming w can be negative, with leaky-relu for example), w+eps could push w closer to 0, or even change the sign.

debapriyamaji · 2020-04-15T09:07:51Z

Hi Juntang,
Thanks for the quick response.

I was referring to this part of the code in inplace_abn_cuda.cu, line 114:

template
global void forward_kernel(T *x, const T *mean, const T *var, const T *weight, const T *bias,
bool affine, float eps, int num, int chn, int sp) {
int plane = blockIdx.x;

T _mean = mean[plane];
T _var = var[plane];
T _weight = affine ? abs(weight[plane]) + eps : T(1);
T _bias = affine ? bias[plane] : T(0);

T mul = rsqrt(_var + eps) * _weight;

for (int batch = 0; batch < num; ++batch) {
for (int n = threadIdx.x; n < sp; n += blockDim.x) {
T _x = x[(batch * chn + plane) * sp + n];
T _y = (_x - _mean) * mul + _bias;
x[(batch * chn + plane) * sp + n] = _y;
}
}
}*

Here,
T _weight = affine ? abs(weight[plane]) + eps : T(1);

This is quite different from pytorch's batchnorm implementation where we use the weight without any modification.

While using pytorch's batchnorm, if I perform the exact same operation to the weights before calling batchnorm or modify the weights while loading, I am able to replicate the accuracy.

Regards - Debapriya

poornimajd · 2020-09-04T04:57:21Z

@debapriyamaji ,I want to run the inference of the model on cpu,and may be replacing inplace abn batchnorm to torch batchnorm is the way to go.Could you please share your modified script,so that it will help me to run on CPU.Also did you try to run the model on cpu?
Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functionality difference between pytorch batchnorm and synchronised batchnorm #16

Functionality difference between pytorch batchnorm and synchronised batchnorm #16

debapriyamaji commented Apr 14, 2020 •

edited

Loading

juntang-zhuang commented Apr 14, 2020 •

edited

Loading

debapriyamaji commented Apr 15, 2020 •

edited

Loading

poornimajd commented Sep 4, 2020 •

edited

Loading

Functionality difference between pytorch batchnorm and synchronised batchnorm #16

Functionality difference between pytorch batchnorm and synchronised batchnorm #16

Comments

debapriyamaji commented Apr 14, 2020 • edited Loading

juntang-zhuang commented Apr 14, 2020 • edited Loading

debapriyamaji commented Apr 15, 2020 • edited Loading

poornimajd commented Sep 4, 2020 • edited Loading

debapriyamaji commented Apr 14, 2020 •

edited

Loading

juntang-zhuang commented Apr 14, 2020 •

edited

Loading

debapriyamaji commented Apr 15, 2020 •

edited

Loading

poornimajd commented Sep 4, 2020 •

edited

Loading