Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14981

speediedan · 2022-10-03T21:53:16Z

🚀 Feature

ddp_fork (and associated alias strategies) cannot currently be used along with native AMP due to the invocation of the CUDA Runtime API within the call to GradScaler in the NativeMixedPrecisionPlugin:

https://github.com/Lightning-AI/lightning/blob/c059db446e7bfea03fba91e598ad503f0d1c6581/src/pytorch_lightning/plugins/precision/native_amp.py#L53

which in turn initializes CUDA and poisons subsequent forks.

It may be possible with a future version of PyTorch to alter the default behavior of torch.cuda.is_available() to use an NVML-based CUDA assessment throughout Lightning. In the meantime, patching torch.cuda.is_available() with Lightning's implementation of the upstream NVML-based assessment can unlock this functionality.

I'll be opening a PR shortly that patches torch.cuda.is_available() within NativeMixedPrecisionPlugin (both Lite and PL versions) and adds a standalone test for the ddp_fork strategy in a CUDA and AMP context (adding a standalone test only for PL given how expensive the standalone multi-gpu tests can be).

Motivation

Many users will use AMP within the context of jupyter notebooks, where if using multiple GPUS, ddp_fork will be important to support.

Pitch

Allow the use of AMP within the context of jupyter notebooks, where if using multiple GPUS, ddp_fork will be important to support.
I will open a small PR shortly that makes this available.

Additional context

There's a related PR in PyTorch currently that may allow the requested modification of torch.cuda.is_available() throughout Lightning without needing to patch the function or add Lightning's own NVML-based assessment (once the relevant version of PyTorch is the minimum)

cc @justusschock @awaelchli @carmocca

The text was updated successfully, but these errors were encountered:

speediedan added the needs triage Waiting to be triaged by maintainers label Oct 3, 2022

speediedan mentioned this issue Oct 3, 2022

Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14984

Merged

12 tasks

carmocca added bug Something isn't working strategy: ddp spawn precision: amp Automatic Mixed Precision and removed needs triage Waiting to be triaged by maintainers labels Oct 4, 2022

awaelchli closed this as completed in #14984 Oct 5, 2022

awaelchli added strategy: ddp DistributedDataParallel and removed strategy: ddp spawn labels Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14981

Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14981

speediedan commented Oct 3, 2022 •

edited by github-actions bot

Loading

Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14981

Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14981

Comments

speediedan commented Oct 3, 2022 • edited by github-actions bot Loading

🚀 Feature

Motivation

Pitch

Additional context

speediedan commented Oct 3, 2022 •

edited by github-actions bot

Loading