-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Integrate multi-gpu training for semi-supervised learning and self-supervised learning #1534
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #1534 +/- ##
===========================================
- Coverage 80.68% 80.68% -0.01%
===========================================
Files 486 486
Lines 33249 33253 +4
===========================================
+ Hits 26826 26829 +3
- Misses 6423 6424 +1
... and 3 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, I left a comment for question
Could you add Test Case for this with skip flag? I'll re-enable multi GPU TCs after checking. |
@jaegukhyun @eunwoosh could you check again? |
Summary
This PR is to enable multi-gpu training for semi-supervised learning and self-supervised learning.
Main Issues
RuntimeError: storage has wrong size
. Because two models are needed to be built. After the first model(online_model) is loaded and saved to cache, the second model(target_model) will bring the ckpt from the cache because pretrained url is the same since they are same model. Such access to the cache can be done concurrently and make it corrupt during serialization between two processes.How to test
Checklist
develop
branchLicense
Feel free to contact the maintainers if that's a concern.