[FEATURE] Integrate multi-gpu training for semi-supervised learning and self-supervised learning #1534

supersoob · 2023-01-17T07:18:42Z

Summary

This PR is to enable multi-gpu training for semi-supervised learning and self-supervised learning.

Detection SEMI
Segmentation SEMI/SELF
Classification SEMI/SELF/SUPCON

Main Issues

All tensors in losses has to have its local rank num because it is broadcasted or ring-scattered to other gpu. In det semisl, there were some tensors (ps_recall, ps_) without device num
In self-sl classification, ckpt loading incurs issues that RuntimeError: storage has wrong size. Because two models are needed to be built. After the first model(online_model) is loaded and saved to cache, the second model(target_model) will bring the ckpt from the cache because pretrained url is the same since they are same model. Such access to the cache can be done concurrently and make it corrupt during serialization between two processes.

How to test

Run e2e or integration test for each task

Checklist

I submit my changes into the develop branch
I have added description of my changes into CHANGELOG
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below)

# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

…ning_extensions into multigpu_semisl

codecov-commenter · 2023-03-08T09:39:01Z

Codecov Report

Patch coverage: 44.44% and project coverage change: -0.01 ⚠️

Comparison is base (6116639) 80.68% compared to head (1c21c29) 80.68%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1534      +/-   ##
===========================================
- Coverage    80.68%   80.68%   -0.01%     
===========================================
  Files          486      486              
  Lines        33249    33253       +4     
===========================================
+ Hits         26826    26829       +3     
- Misses        6423     6424       +1

Impacted Files	Coverage Δ
...x/mpa/modules/models/detectors/unbiased_teacher.py	`19.84% <0.00%> (-0.16%)`	⬇️
...fication/adapters/mmcls/models/classifiers/byol.py	`89.24% <100.00%> (+0.35%)`	⬆️

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

…_extensions into multigpu_semisl

otx/algorithms/classification/adapters/mmcls/models/classifiers/byol.py

jaegukhyun

Generally LGTM, I left a comment for question

eunwoosh · 2023-03-10T01:15:50Z

Could you add Test Case for this with skip flag? I'll re-enable multi GPU TCs after checking.

…_extensions into multigpu_semisl

supersoob · 2023-03-16T01:58:26Z

@jaegukhyun @eunwoosh could you check again?

Lee, Soobee added 3 commits January 13, 2023 00:30

enable semisl multigpu in seg

35428eb

Merge branch 'feature/otx' of https://github.com/openvinotoolkit/trai…

eaede06

…ning_extensions into multigpu_semisl

enable multi-gpu in det semi

5648e0c

github-actions bot added CLI Any changes in OTE CLI dependencies labels Feb 2, 2023

supersoob changed the base branch from feature/otx to develop February 2, 2023 05:58

Lee, Soobee added 3 commits February 2, 2023 23:15

Rebase

b9a54d7

test revised

7309775

test revised

30ef2bc

goodsong81 removed dependencies labels Feb 9, 2023

github-actions bot added ALGO Any changes in OTX Algo Tasks implementation TEST Any changes in tests and removed CLI Any changes in OTE CLI labels Mar 8, 2023

supersoob marked this pull request as ready for review March 8, 2023 09:13

supersoob requested a review from a team as a code owner March 8, 2023 09:13

supersoob requested a review from eunwoosh March 8, 2023 09:13

supersoob changed the title ~~Integrate multi-gpu training for semi-supervised learning and self-supervised learning~~ [FEATURE] Integrate multi-gpu training for semi-supervised learning and self-supervised learning Mar 8, 2023

Lee, Soobee added 4 commits March 8, 2023 21:34

Merge branch 'develop' of https://github.com/openvinotoolkit/training…

02ed61d

…_extensions into multigpu_semisl

enable selfsl multigpu and add test codes

7a399a3

Merge branch 'develop' of https://github.com/openvinotoolkit/training…

4eb4fdf

…_extensions into multigpu_semisl

rename

7e4e0fe

jaegukhyun reviewed Mar 10, 2023

View reviewed changes

otx/algorithms/classification/adapters/mmcls/models/classifiers/byol.py Show resolved Hide resolved

jaegukhyun reviewed Mar 10, 2023

View reviewed changes

github-actions bot added the DOC Improvements or additions to documentation label Mar 15, 2023

supersoob requested review from jaegukhyun and goodsong81 March 15, 2023 09:15

Lee, Soobee added 3 commits March 16, 2023 02:30

add e2e test

3607dc0

update doc

3a9f8a4

Merge branch 'develop' of https://github.com/openvinotoolkit/training…

1c21c29

…_extensions into multigpu_semisl

jaegukhyun approved these changes Mar 16, 2023

View reviewed changes

supersoob requested a review from sungmanc March 16, 2023 04:49

supersoob enabled auto-merge (squash) March 16, 2023 08:15

sungmanc approved these changes Mar 16, 2023

View reviewed changes

supersoob merged commit c54ba73 into develop Mar 16, 2023

supersoob deleted the multigpu_semisl branch March 16, 2023 08:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Integrate multi-gpu training for semi-supervised learning and self-supervised learning #1534

[FEATURE] Integrate multi-gpu training for semi-supervised learning and self-supervised learning #1534

supersoob commented Jan 17, 2023 •

edited

Loading

codecov-commenter commented Mar 8, 2023 •

edited

Loading

jaegukhyun left a comment

eunwoosh commented Mar 10, 2023 •

edited

Loading

supersoob commented Mar 16, 2023

[FEATURE] Integrate multi-gpu training for semi-supervised learning and self-supervised learning #1534

[FEATURE] Integrate multi-gpu training for semi-supervised learning and self-supervised learning #1534

Conversation

supersoob commented Jan 17, 2023 • edited Loading

Summary

How to test

Checklist

License

codecov-commenter commented Mar 8, 2023 • edited Loading

Codecov Report

jaegukhyun left a comment

Choose a reason for hiding this comment

eunwoosh commented Mar 10, 2023 • edited Loading

supersoob commented Mar 16, 2023

supersoob commented Jan 17, 2023 •

edited

Loading

codecov-commenter commented Mar 8, 2023 •

edited

Loading

eunwoosh commented Mar 10, 2023 •

edited

Loading