-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BatchNorm after ReLU #5
Comments
Oh, interesting! I'll add a link to this issue in the README, if you don't mind. What is the 'scale&bias layer'? In Torch, batch normalization layers have learnable |
Yes, β and γ. In caffe BatchNorm is split into batchnorm layer and learnable affine params layer. |
That is not correct, I have done batchnorm experiments on plain, non-residual nets only so far :) The batchnorm ResNets are in training. And the "ThinResNet-101" from my benchmark does not use batchnorm at all - as baseline. |
Oh I guess I misunderstood, pardon. So this experiment was on an ordinary Caffenet, not a residual network? |
Yes. |
Thanks, that makes sense. It's interesting because it challenges the commonly-held assumption that batch norm before ReLU is better than after. I'd be interested to see how much of an impact the residual network architecture has on ImageNet---the harder the task, the more of an effect different architectures seem to have. |
I never understand this from original paper, because sense of data whitening is normalization of layer input, and ReLU output is usually input for next layer. |
@ducha-aiki The paper reads:
I get from this that its better to batch-normalize the linear function since its more likely to behave like a normal distribution (from which the method is derived), especially for cases like the ReLU function which is asymmetric. |
Hi,
I am performing somehow similar benchmark, but on caffenet128 (and moving to ResNets now) on ImageNet.
One thing, that I have found - the best position of BN in non-res net is after ReLU and without scale+bias layer (https://github.com/ducha-aiki/caffenet-benchmark/blob/master/batchnorm.md):
May be, it is worth testing too.
Second, results on CIFAR-10 often contradicts results on ImageNet. I.e., leaky ReLU > ReLU on CIFAR, but worse on ImageNet.
P.S. We could cooperate in ImageNet testing, if you agree.
The text was updated successfully, but these errors were encountered: