Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for fp16 training #18

Closed
baibaidj opened this issue Jul 14, 2020 · 4 comments
Closed

support for fp16 training #18

baibaidj opened this issue Jul 14, 2020 · 4 comments

Comments

@baibaidj
Copy link

Describe the feature
FP16 training

Motivation
FP16 training facilitates faster training. Apex is recommended.
Using default FP32, the possible batch size is small leading to slower training with possible suboptimal performance.

Related resources
https://github.com/NVIDIA/apex

Additional context
No

This was referenced Jul 15, 2020
@xvjiarui xvjiarui added the WIP Work in process label Jul 17, 2020
@hellock hellock removed the WIP Work in process label Jul 20, 2020
@hellock
Copy link
Member

hellock commented Jul 20, 2020

FP16 has been supported.

@hellock hellock closed this as completed Jul 20, 2020
@baibaidj
Copy link
Author

I cannot find Fp16OptimizerHook class in the mmseg. Is it included in the newest mmcv?
When I add this in the opitmizer_config, the system prompts that this class was not registered.
Thank you.

@xvjiarui
Copy link
Collaborator

Hi @baibaidj
You may use the latest MMCV (1.0.3).

@baibaidj
Copy link
Author

baibaidj commented Jul 27, 2020

Hi @baibaidj
You may use the latest MMCV (1.0.3).

Thanks. It worked.
But, I came across degradation in performance when using FP16 training.
The training log is as follows. Could you please take a look and see what's going wrong?
Thank you again.

With some effort, I finally locate the bug. I did not add self.fp16_enabled = False to the BaseDecodeHead class.
After checking, the auto_fp16 class in mmcv seems to only use the attribute (fp16_enabled), rather than the value to determine if to enable fp16 training.
To be honest, it is on the first sight confusing. It would be clearer by self.fp16_enabled = True.
This fp16_enabled attribute is also added in BaseSegmentor class. I was wondering why it has appeared twice? Thank you.

aravind-h-v pushed a commit to aravind-h-v/mmsegmentation that referenced this issue Mar 27, 2023
…diffusers

changes comments and env vars in `utils/logging.py`
wjkim81 pushed a commit to wjkim81/mmsegmentation that referenced this issue Dec 3, 2023
sibozhang pushed a commit to sibozhang/mmsegmentation that referenced this issue Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants