-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
@DickJC123 @ptrendx could you help take a look again? Thanks |
@DickJC123 @ptrendx Can you please review this PR once again? Thanks! |
@roywei Could you please rebase and update this PR? Thanks! |
@roywei with CI issue fixed. let's rebase to include the fix. Thanks! |
restarted CI jobs. @roywei can you get your PR merged once they pass CI |
@access2rohit without rebasing, the fix #17962 won't be included [for windows-gpu] |
@ChaiBapchya @access2rohit Rebased! Thanks! |
@@ -1495,7 +1500,7 @@ class RNNOp { | |||
cudnnRNNInputMode_t input_mode_; | |||
cudnnDropoutDescriptor_t dropout_desc_; | |||
Storage::Handle reserve_space_; | |||
uint64_t seed_ = 17 + rand() % 4096; // NOLINT(runtime/threadsafe_fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seed_
is used for rand_r
in the forward implementations. That's very bad practice, not portable and should be fixed.
As this PR already passes CI, it's fine to fix it in a follow-up PR. In fac, #17984 is blocked by the usage of rand_r
so I may fix it there.
More info: https://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Random<xpu, unsigned> *prnd = ctx.requested[1].get_random<xpu, unsigned>(s); | ||
uint64_t rng_seed = prnd->GetSeed(); | ||
// reset dropout descriptor if rng seed changed. | ||
bool reset = seed_ != rng_seed; | ||
seed_ = rng_seed; | ||
ctx.requested[0].get_cudnn_dropout_desc(&dropout_desc_, s, 1.0f - this->pkeep_, | ||
seed_, reset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roywei there are more than 1 random number generators in the resources (4 by default) and they could have different seeds. The result of rotating random number generator seeds here is that cudnn dropout state is reinitialized very often. @sxjscience observed that there's significant performance impact because of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @roywei is already working on this issue, since this regression has been caught by our benchmark last week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time looking into it and actually the problem is dropout's forward is entering the re-init cudnn dropout desc logic every time during forward even without my change. This is true for nd/np and gluon, false for symbol dropout. So originally it was already reinitializing in every forward. The reason it does not cause any performance regression is because it's always using the seed_
defined in uint64_t seed_ = 17 + rand() % 4096;
and never changed, it won't listen to MXNet's PRNG (which is the problem this PR is trying to fix). So if the seed didn't change, event if you re-init cudnn dropout descriptor, it won't take any time. My PR changed the seed, so it was actually re-init every forward, thus the regression.
However, it works fine under symbol case. My guess is somehow for symbolic, every forward is using the same Dropout node and the state check took effect then didn't go into re-init logic. For imperative case the state check is always empty and went into the re-init logic.
Compare the following two code, one is gluon and one is symbol, if I print some log during reinitialization in the original code without my change here. It will print out every forward in ND/NP/Gluon, not the case in Symbol.
import mxnet as mx
data = mx.nd.ones((10, 200, 300, 500), ctx=mx.gpu(0))
dropout = mx.gluon.nn.Dropout(0.5)
# with or without hybridize is the same result
dropout.hybridize()
with mx.autograd.record():
result1 = dropout(data)
result2 = dropout(result1)
print 2 times
re-init dropout desc
re-init dropout desc
Symbol:
import mxnet as mx
data = mx.nd.ones((10, 200, 300, 500), ctx=mx.gpu(0))
net = mx.sym.Variable("data")
net = mx.sym.Dropout(data=net, p=0.5, cudnn_off=False)
exe = net.simple_bind(mx.gpu(0), data=data.shape)
result1 = exe.forward(is_train=True, data=data)
result2 = exe.forward(is_train=True, data=result1[0])
print 1 time
re-init dropout desc
Given this situation, I don't have a good solution as checking the state handle size does not work in imperative mode, checking PRNG seed will also not work. I will revert this PR for now as it's causing regressions for models using dropout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roywei the links in your comment don't seem to point to what they refer to so I'm a bit confused. Where did you insert the 're-init dropout desc' comment?
This reverts commit 249b9a1.
This reverts commit 249b9a1.
This reverts commit 249b9a1.
* Revert "Fix cudnn Dropout reproducibility (apache#17547)" This reverts commit 249b9a1. * fix conflict
* Revert "Fix cudnn Dropout reproducibility (apache#17547)" This reverts commit 249b9a1. * fix conflict
Fix #15662
This will replace #16532 with an alternative solution.
Please refer to the discussion in #16532 (comment)
GetSeed()
in GPU random resource, it's similar to CPU version.Benefit:
cudnnDropoutGetStatesSize
.Drawback:
Will be affected by #7410, by default mxnet's random seed is fixed.