Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhance] Continue to speed up training. #6974

Merged
merged 2 commits into from
Jan 17, 2022

Conversation

RangiLyu
Copy link
Member

@RangiLyu RangiLyu commented Jan 10, 2022

Motivation

This PR continues on #6867.

Not only limit the opencv multi-processing but also set OMP and MKL threads to 1 if not set in the environment.
Also, switch the start method from spawn to fork to speed up the start time.

Comparison

V100 x8

system info:

sys.platform: linux                                                                                                                                                                                                                                    
Python: 3.8.11 (default, Aug  3 2021, 15:09:35) [GCC 7.5.0]                                                                                                                                                                                            
CUDA available: True                                                                                                                                                                                                                                   
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-32GB                                                                                                                                                                                                              
CUDA_HOME: /mnt/lustre/share/cuda-11.2
NVCC: Build cuda_11.2.r11.2/compiler.29618528_0
GCC: gcc (GCC) 5.4.0
PyTorch: 1.9.0
TorchVision: 0.10.0
OpenCV: 4.5.3
MMCV: 1.4.0
MMCV Compiler: GCC 5.4
MMCV CUDA Compiler: 11.2
MMDetection: 2.20.0+ff9bc39

YOLOX-s

launcher: slurm
workers per GPU: 8
file client: s3

OMP&MKL Thread OpenCV thread MP start method Start time datatime time Eta @ 3 epoch 1400 iter
default default spawn 10min 0.030 0.334 2 days, 11:10:22
default 0 spawn 10min 0.034 0.275 2 days, 3:03:15
1 default spawn 9min 0.027 0.297 2 days, 5:27:36
default default fork 36s 0.027 0.327 2 days, 1:39:53
1 0 fork 24s 0.025 0.268 1 day, 18:31:02

Faster RCNN

launcher: slurm
workers per GPU: 2
file client: s3

OMP&MKL Thread OpenCV thread MP start method Start time datatime time Eta @ 1 epoch 4000 iter
default default spawn 1min19s 0.011 0.483 11:40:39
1 0 fork 12s 0.010 0.232 5:32:23

A100 x8

sys.platform: linux
Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: A100-SXM-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.2.r11.2/compiler.29618528_0
GCC: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
PyTorch: 1.10.1
TorchVision: 0.11.2
OpenCV: 4.5.5
MMCV: 1.4.2
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.20.0+ff9bc39

YOLOX-s

launcher: torch
workers per GPU: 8
file client: hard disk

OMP&MKL Thread OpenCV thread MP start method Start time datatime time Eta @ 3 epoch 1400 iter
num CPU core default spawn 7min 0.039 0.317 2 days, 5:46:18
1 default spawn 5min 0.036 0.308 2 days, 3:07:02
1 default fork 10s 0.035 0.298 2 days, 1:57:50
1 0 fork 10s 0.016 0.189 1 day, 5:08:11

@RangiLyu RangiLyu added the enhancement New feature or request label Jan 10, 2022
@RangiLyu RangiLyu self-assigned this Jan 10, 2022
tools/train.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 14, 2022

Codecov Report

Merging #6974 (114d089) into dev (6f2e6d1) will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #6974      +/-   ##
==========================================
- Coverage   62.34%   62.34%   -0.01%     
==========================================
  Files         327      327              
  Lines       26129    26129              
  Branches     4424     4424              
==========================================
- Hits        16290    16289       -1     
- Misses       8970     8971       +1     
  Partials      869      869              
Flag Coverage Δ
unittests 62.32% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmdet/models/dense_heads/base_dense_head.py 88.70% <0.00%> (-1.70%) ⬇️
mmdet/core/bbox/samplers/random_sampler.py 80.55% <0.00%> (+5.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 05a3fbe...114d089. Read the comment docs.

@shinya7y
Copy link
Contributor

Does setup_multi_processes also work for tools/test.py?

@RangiLyu
Copy link
Member Author

Does setup_multi_processes also work for tools/test.py?

Yes, it works. I'll add this both in train.py and test.py.

@ZwwWayne ZwwWayne merged commit 4b87ddc into open-mmlab:dev Jan 17, 2022
chhluo pushed a commit to chhluo/mmdetection that referenced this pull request Feb 21, 2022
* [Enhance] Speed up training time.

* set in cfg
ZwwWayne pushed a commit that referenced this pull request Jul 18, 2022
* [Enhance] Speed up training time.

* set in cfg
ZwwWayne pushed a commit to ZwwWayne/mmdetection that referenced this pull request Jul 19, 2022
* [Enhance] Speed up training time.

* set in cfg
@RangiLyu RangiLyu deleted the save_life_v2 branch December 17, 2022 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants