Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Integrate multi-gpu training for semi-supervised learning and self-supervised learning #1534

Merged
merged 13 commits into from
Mar 16, 2023

Conversation

supersoob
Copy link
Contributor

@supersoob supersoob commented Jan 17, 2023

Summary

This PR is to enable multi-gpu training for semi-supervised learning and self-supervised learning.

  • Detection SEMI
  • Segmentation SEMI/SELF
  • Classification SEMI/SELF/SUPCON

Main Issues

  • All tensors in losses has to have its local rank num because it is broadcasted or ring-scattered to other gpu. In det semisl, there were some tensors (ps_recall, ps_) without device num
  • In self-sl classification, ckpt loading incurs issues that RuntimeError: storage has wrong size. Because two models are needed to be built. After the first model(online_model) is loaded and saved to cache, the second model(target_model) will bring the ckpt from the cache because pretrained url is the same since they are same model. Such access to the cache can be done concurrently and make it corrupt during serialization between two processes.

How to test

  • Run e2e or integration test for each task

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

@github-actions github-actions bot added CLI Any changes in OTE CLI dependencies labels Feb 2, 2023
@supersoob supersoob changed the base branch from feature/otx to develop February 2, 2023 05:58
@github-actions github-actions bot added ALGO Any changes in OTX Algo Tasks implementation TEST Any changes in tests and removed CLI Any changes in OTE CLI labels Mar 8, 2023
@supersoob supersoob marked this pull request as ready for review March 8, 2023 09:13
@supersoob supersoob requested a review from a team as a code owner March 8, 2023 09:13
@supersoob supersoob requested a review from eunwoosh March 8, 2023 09:13
@supersoob supersoob changed the title Integrate multi-gpu training for semi-supervised learning and self-supervised learning [FEATURE] Integrate multi-gpu training for semi-supervised learning and self-supervised learning Mar 8, 2023
@codecov-commenter
Copy link

codecov-commenter commented Mar 8, 2023

Codecov Report

Patch coverage: 44.44% and project coverage change: -0.01 ⚠️

Comparison is base (6116639) 80.68% compared to head (1c21c29) 80.68%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1534      +/-   ##
===========================================
- Coverage    80.68%   80.68%   -0.01%     
===========================================
  Files          486      486              
  Lines        33249    33253       +4     
===========================================
+ Hits         26826    26829       +3     
- Misses        6423     6424       +1     
Impacted Files Coverage Δ
...x/mpa/modules/models/detectors/unbiased_teacher.py 19.84% <0.00%> (-0.16%) ⬇️
...fication/adapters/mmcls/models/classifiers/byol.py 89.24% <100.00%> (+0.35%) ⬆️

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@jaegukhyun jaegukhyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, I left a comment for question

@eunwoosh
Copy link
Contributor

eunwoosh commented Mar 10, 2023

Could you add Test Case for this with skip flag? I'll re-enable multi GPU TCs after checking.

@github-actions github-actions bot added the DOC Improvements or additions to documentation label Mar 15, 2023
@supersoob
Copy link
Contributor Author

@jaegukhyun @eunwoosh could you check again?

@supersoob supersoob requested a review from sungmanc March 16, 2023 04:49
@supersoob supersoob enabled auto-merge (squash) March 16, 2023 08:15
@supersoob supersoob merged commit c54ba73 into develop Mar 16, 2023
@supersoob supersoob deleted the multigpu_semisl branch March 16, 2023 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ALGO Any changes in OTX Algo Tasks implementation DOC Improvements or additions to documentation TEST Any changes in tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants