Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14984

speediedan · 2022-10-03T22:29:30Z

What does this PR do?

ddp_fork (and associated alias strategies) cannot currently be used along with native AMP due to the invocation of the CUDA Runtime API within the call to GradScaler in the NativeMixedPrecisionPlugin:

https://github.com/Lightning-AI/lightning/blob/c059db446e7bfea03fba91e598ad503f0d1c6581/src/pytorch_lightning/plugins/precision/native_amp.py#L53

which in turn initializes CUDA and poisons subsequent forks.

It may be possible with a future version of PyTorch to alter the default behavior of torch.cuda.is_available() to use an NVML-based CUDA assessment throughout Lightning. In the meantime, patching torch.cuda.is_available() with Lightning's implementation of the upstream NVML-based assessment can unlock this functionality.

This PR patches torch.cuda.is_available() within NativeMixedPrecisionPlugin (both Lite and PL versions) and adds a standalone test for the ddp_fork strategy in a CUDA and native AMP context (adding a standalone test only for PL given how expensive the standalone multi-gpu tests can be).

The use of ddp_fork (and its associated alias strategies) with native AMP in context (e.g. in the context of Jupyter notebooks) should be possible after this PR. Note that this PR does not address or pertain to bf16 based AMP support.

Additional context

There's a related PR in PyTorch currently that may allow the requested modification of torch.cuda.is_available() throughout Lightning without needing to patch the function or add Lightning's own NVML-based assessment (once the relevant version of PyTorch is the minimum)

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2022-10-04T03:54:47Z

Codecov Report

Merging #14984 (bba3e82) into master (7fed7a1) will increase coverage by 1%.
The diff coverage is 72%.

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #14984     +/-   ##
=========================================
+ Coverage      83%      84%     +1%     
=========================================
  Files         396      276    -120     
  Lines       29089    21245   -7844     
=========================================
- Hits        24074    17842   -6232     
+ Misses       5015     3403   -1612

tests/tests_pytorch/helpers/runif.py

tests/tests_pytorch/models/test_ddp_fork_amp.py

src/lightning_lite/accelerators/cuda.py

tests/tests_lite/helpers/runif.py

tests/tests_pytorch/models/test_amp.py

src/lightning_lite/accelerators/cuda.py

…DA availability assessment

…If condition

…e` with its NVML-based version if possible

…pe of new ddp_fork test

Co-authored-by: Carlos Mocholí <[email protected]>

… new `_patch_cuda_is_available` context manager

for more information, see https://pre-commit.ci

tests/tests_pytorch/models/test_ddp_fork_amp.py

…, we only need 1 GPU

…he last few hours

…DA availability assessment (#14984) Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]>

github-actions bot added the pl Generic label for PyTorch Lightning package label Oct 3, 2022

speediedan marked this pull request as ready for review October 4, 2022 00:33

speediedan requested review from awaelchli, carmocca, justusschock, rohitgr7 and otaj as code owners October 4, 2022 00:33

speediedan marked this pull request as draft October 4, 2022 00:40

speediedan force-pushed the enable_ddp_fork_strategy_with_native_amp branch 2 times, most recently from 51f7669 to 09a42e5 Compare October 4, 2022 03:14

speediedan marked this pull request as ready for review October 4, 2022 03:40

carmocca reviewed Oct 4, 2022

View reviewed changes

mergify bot added the has conflicts label Oct 4, 2022

awaelchli approved these changes Oct 4, 2022

View reviewed changes

tests/tests_lite/helpers/runif.py Outdated Show resolved Hide resolved

tests/tests_pytorch/models/test_amp.py Outdated Show resolved Hide resolved

awaelchli added the community This PR is from the community label Oct 4, 2022

awaelchli added this to the pl:1.8 milestone Oct 4, 2022

awaelchli self-assigned this Oct 4, 2022

awaelchli added accelerator: cuda Compute Unified Device Architecture GPU precision: amp Automatic Mixed Precision labels Oct 4, 2022

carmocca reviewed Oct 4, 2022

View reviewed changes

speediedan and others added 9 commits October 4, 2022 18:03

support DDP_FORK strategy with native AMP by attempting NVML-based CU…

fe938c5

…DA availability assessment

update CHANGELOG

f91d46c

narrow scope of RunIf NVML-based CUDA availability check to a new Run…

5760519

…If condition

clarify new runif condition, add skip_windows condition to new test

18a99b5

add context manager that safely patches :func:`torch.cuda.is_availabl…

9908f3f

…e` with its NVML-based version if possible

update min_cuda_gpus RunIf condition for both PL and Lite, narrow sco…

3bc0117

…pe of new ddp_fork test

restructure to avoid unnecessary None check

dac8f68

Co-authored-by: Carlos Mocholí <[email protected]>

remove unnecessary None assignment

4016b5d

Co-authored-by: Carlos Mocholí <[email protected]>

cleanup verbose documentation, revert AMPTestModel location, refine…

8441726

… new `_patch_cuda_is_available` context manager

speediedan force-pushed the enable_ddp_fork_strategy_with_native_amp branch from f818027 to 8441726 Compare October 5, 2022 01:05

mergify bot removed the has conflicts label Oct 5, 2022

add missing yield

6cc8e44

speediedan force-pushed the enable_ddp_fork_strategy_with_native_amp branch from 867f7c7 to 6cc8e44 Compare October 5, 2022 01:15

pre-commit-ci bot and others added 2 commits October 5, 2022 01:17

[pre-commit.ci] auto fixes from pre-commit.com hooks

3a4b0b9

for more information, see https://pre-commit.ci

nit

eb04abe

carmocca approved these changes Oct 5, 2022

View reviewed changes

tests/tests_pytorch/models/test_ddp_fork_amp.py Show resolved Hide resolved

mergify bot added ready PRs ready to be merged has conflicts and removed ready PRs ready to be merged labels Oct 5, 2022

Merge branch 'master' into enable_ddp_fork_strategy_with_native_amp

b407bfe

mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Oct 5, 2022

justusschock approved these changes Oct 5, 2022

View reviewed changes

speediedan added 2 commits October 5, 2022 14:29

since we're not using my original ddp_fork full strategy test anymore…

635d9bd

…, we only need 1 GPU

looks like an unrequired import got added to native_amp sometime in t…

bba3e82

…he last few hours

awaelchli merged commit 3b75c52 into Lightning-AI:master Oct 5, 2022

awaelchli mentioned this pull request Oct 5, 2022

Re-evaluate CUDA NVML and fork checks after torch 1.13 release #15010

Closed

otaj mentioned this pull request Oct 10, 2022

Add ColossalAI strategy #14224

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14984

Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14984

speediedan commented Oct 3, 2022 •

edited

Loading

codecov bot commented Oct 4, 2022 •

edited

Loading

Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14984

Support ddp_fork strategy with native AMP by attempting NVML-based CUDA availability assessment #14984

Conversation

speediedan commented Oct 3, 2022 • edited Loading

What does this PR do?

Additional context

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Did you have fun?

codecov bot commented Oct 4, 2022 • edited Loading

Codecov Report

speediedan commented Oct 3, 2022 •

edited

Loading

codecov bot commented Oct 4, 2022 •

edited

Loading