Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14981
Labels
bug
Something isn't working
precision: amp
Automatic Mixed Precision
strategy: ddp
DistributedDataParallel
🚀 Feature
ddp_fork
(and associated alias strategies) cannot currently be used along with native AMP due to the invocation of the CUDA Runtime API within the call toGradScaler
in theNativeMixedPrecisionPlugin
:https://github.com/Lightning-AI/lightning/blob/c059db446e7bfea03fba91e598ad503f0d1c6581/src/pytorch_lightning/plugins/precision/native_amp.py#L53
which in turn initializes CUDA and poisons subsequent forks.
It may be possible with a future version of PyTorch to alter the default behavior of
torch.cuda.is_available()
to use an NVML-based CUDA assessment throughout Lightning. In the meantime, patchingtorch.cuda.is_available()
with Lightning's implementation of the upstream NVML-based assessment can unlock this functionality.I'll be opening a PR shortly that patches
torch.cuda.is_available()
withinNativeMixedPrecisionPlugin
(both Lite and PL versions) and adds a standalone test for theddp_fork
strategy in a CUDA and AMP context (adding a standalone test only for PL given how expensive the standalone multi-gpu tests can be).Motivation
Many users will use AMP within the context of jupyter notebooks, where if using multiple GPUS,
ddp_fork
will be important to support.Pitch
Allow the use of AMP within the context of jupyter notebooks, where if using multiple GPUS,
ddp_fork
will be important to support.I will open a small PR shortly that makes this available.
Additional context
There's a related PR in PyTorch currently that may allow the requested modification of
torch.cuda.is_available()
throughout Lightning without needing to patch the function or add Lightning's own NVML-based assessment (once the relevant version of PyTorch is the minimum)cc @justusschock @awaelchli @carmocca
The text was updated successfully, but these errors were encountered: