-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convergence problem #9
Comments
Hi @yangpc615, thanks for the report. I have a few questions to understand the case.
|
thanks for your reply,do you know mmdetection ?
|
And I want to know how to update network by compute method not checkpoint in torchgpipe. |
@yangpc615 Did you mean that your network doesn't converge both with or without checkpointing? Anyways, if the network highly relies on BatchNorm, a large number of micro-batches may affect training just like DataParallel. See the trade-off of a number of micro-batches. There's an option for this case in GPipe. See "Deferred Batch Normalization" to get more details. |
Thank you. In addition I don't understand the following code:
What are functions of them and what relation is their functions with the following code:
|
@yangpc615 That is a good question. However, I recommend making a separate issue for a new question not related to the convergence problem.
|
When I used the second method of compute not checkpoint, I found my the effect of my network become worse and it is proportional to the number of divisions.
The text was updated successfully, but these errors were encountered: