Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare DeepSpeech (w/Dropout) vs DeepSpeech (w/o Dropout) + BatchNorm #373

Closed
kdavis-mozilla opened this issue Feb 11, 2017 · 6 comments
Closed

Comments

@kdavis-mozilla
Copy link
Contributor

A bonus, no more optimizing dropout rates.

@ghost
Copy link

ghost commented Apr 10, 2017

@kdavis-mozilla any progress on this ? What is is reason behind dropout of current layer being (1- dropout) of previous layer ?

@kdavis-mozilla
Copy link
Contributor Author

The minus one has nothing to do with this issue. TensorFlow uses keep probabilities not dropout rates, thus the minus one.

This issue asks how does performance change when dropout is exchanged for batch norm.

@andi4191
Copy link
Contributor

andi4191 commented Jul 6, 2017

@reuben : I have few doubts for this one.

  1. Is it expected to use the BatchNorm layer after every layer or after h1, h2, h3, h5 layers?
  2. If my understanding is correct, BatchNorm has to behave differently for training and testing phase. For training the mean and variance would be of the batch whereas for training the mean and variance would be of complete test dataset. Hence, for training phase an update_op needs to be added to the dependency for updating the moving mean and variances before every training step (Referenced from tensor flow documentation).

Please correct me if my understanding is incorrect.

@reuben
Copy link
Contributor

reuben commented Jul 7, 2017

  1. We've seen approaches using BN after every hidden layer, but also only before layer type changes (e.g. convolution -> fully connected). I guess the most direct comparison would be to replace all the cases of dropout with BatchNorm.
  2. Yes, your understanding is correct. TensorFlow's BatchNorm is sometimes tricky to get right. I'm currently experimenting with defining the model using Keras, which has their own implementation of BatchNorm, and is probably easier to get right. I'll let you know what I find.

andi4191 added a commit to andi4191/DeepSpeech that referenced this issue Aug 4, 2017
@kdavis-mozilla
Copy link
Contributor Author

Closing for lack of activity.

@lock
Copy link

lock bot commented Feb 9, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Feb 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants