support for fp16 training #18

baibaidj · 2020-07-14T14:52:51Z

Describe the feature
FP16 training

Motivation
FP16 training facilitates faster training. Apex is recommended.
Using default FP32, the possible batch size is small leading to slower training with possible suboptimal performance.

Related resources
https://github.com/NVIDIA/apex

Additional context
No

hellock · 2020-07-20T07:19:03Z

FP16 has been supported.

baibaidj · 2020-07-21T06:32:02Z

I cannot find Fp16OptimizerHook class in the mmseg. Is it included in the newest mmcv?
When I add this in the opitmizer_config, the system prompts that this class was not registered.
Thank you.

xvjiarui · 2020-07-23T03:04:47Z

Hi @baibaidj
You may use the latest MMCV (1.0.3).

baibaidj · 2020-07-27T05:34:05Z

Hi @baibaidj
You may use the latest MMCV (1.0.3).

Thanks. It worked.
But, I came across degradation in performance when using FP16 training.
The training log is as follows. Could you please take a look and see what's going wrong?
Thank you again.

With some effort, I finally locate the bug. I did not add self.fp16_enabled = False to the BaseDecodeHead class.
After checking, the auto_fp16 class in mmcv seems to only use the attribute (fp16_enabled), rather than the value to determine if to enable fp16 training.
To be honest, it is on the first sight confusing. It would be clearer by self.fp16_enabled = True.
This fp16_enabled attribute is also added in BaseSegmentor class. I was wondering why it has appeared twice? Thank you.

…diffusers changes comments and env vars in `utils/logging.py`

Co-authored-by: jin-s13 <[email protected]>

This was referenced Jul 15, 2020

Roadmap of MMSegmentation #13

Open

Support FP16 #21

Merged

xvjiarui added the WIP Work in process label Jul 17, 2020

hellock removed the WIP Work in process label Jul 20, 2020

hellock closed this as completed Jul 20, 2020

UESTC-Liuxin mentioned this issue Aug 2, 2020

CUDA error: an illegal memory access was encountered #42

Closed

bkbjsd mentioned this issue Nov 21, 2020

CUDA error: an illegal memory access was encountered #270

Closed

babakbch mentioned this issue Sep 7, 2021

RuntimeError: CUDA out of memory. #832

Closed

chiba1sonny mentioned this issue Nov 8, 2021

RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch) #1021

Closed

chenhaiwen mentioned this issue Nov 25, 2021

ImportError: And RuntimeError: Broken pipe #1077

Closed

dongbo811 mentioned this issue Mar 28, 2022

The training is normal, but the verification always fails #1427

Closed

deepakkupanda mentioned this issue May 26, 2022

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) #1618

Open

xiaoaxiaoxiaocao mentioned this issue Mar 5, 2023

Help me, binary segmentation acc error! #2628

Closed

aravind-h-v pushed a commit to aravind-h-v/mmsegmentation that referenced this issue Mar 27, 2023

Merge pull request open-mmlab#18 from vvvm23/logging-transformers-to-…

ebbba62

…diffusers changes comments and env vars in `utils/logging.py`

wjkim81 pushed a commit to wjkim81/mmsegmentation that referenced this issue Dec 3, 2023

add unitest (open-mmlab#18)

bd8ac5c

Co-authored-by: jin-s13 <[email protected]>

sibozhang pushed a commit to sibozhang/mmsegmentation that referenced this issue Mar 22, 2024

Add unittest for infernce (open-mmlab#18)

b44845f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for fp16 training #18

support for fp16 training #18

baibaidj commented Jul 14, 2020

hellock commented Jul 20, 2020

baibaidj commented Jul 21, 2020

xvjiarui commented Jul 23, 2020

baibaidj commented Jul 27, 2020 •

edited

Loading

support for fp16 training #18

support for fp16 training #18

Comments

baibaidj commented Jul 14, 2020

hellock commented Jul 20, 2020

baibaidj commented Jul 21, 2020

xvjiarui commented Jul 23, 2020

baibaidj commented Jul 27, 2020 • edited Loading

baibaidj commented Jul 27, 2020 •

edited

Loading