This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
BatchNorm can not converge with scale=False #18475
Labels
Comments
@sxjscience I'm sorry that I don't have any machine with GPU to check it recently. I read the code of batch norm and its unittest. but no output check when |
I try to test the batch norm with Here are the failure cases when
|
I'm fixing the bug and I will submit a PR later. |
5 tasks
@wkcn Thanks! I will check it in the next pip package release. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Description
BatchNorm operator with
scale=False
can not converge.Error Message
No error message, but loss value and training accuracy is abnormal comparing with
scale=True
BatchNorm.To Reproduce
We can try
https://github.com/nttstar/arcface.np
to train arcface. Add one BatchNorm op withscale=False
after final embedding layerWhat have you tried to solve it?
Scale=True
, it can work but with slightly worse test accuracy.Environment
----------Python Info----------
Version : 3.6.9
Compiler : GCC 7.3.0
Build : ('default', 'Jul 30 2019 19:07:31')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 19.3.1
Directory : /root/anaconda2/envs/py36/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 2.0.0
Directory : /root/anaconda2/envs/py36/lib/python3.6/site-packages/mxnet
Num GPUs : 8
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform : Linux-3.10.0-327.el7.x86_64-x86_64-with-centos-7.5.1804-Core
system : Linux
node : gpu06
release : 3.10.0-327.el7.x86_64
version : #1 SMP Thu Nov 19 22:10:57 UTC 2015
The text was updated successfully, but these errors were encountered: