-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable non blocking to device with MPS #14368
Disable non blocking to device with MPS #14368
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somebody will need to manually run the CI on an MPS device
Codecov Report
@@ Coverage Diff @@
## master #14368 +/- ##
==========================================
- Coverage 79% 76% -3%
==========================================
Files 111 332 +221
Lines 7258 26894 +19636
==========================================
+ Hits 5740 20432 +14692
- Misses 1518 6462 +4944 |
@j0rd1smit seems like you fixed it. I think it's reasonable to update the type, but I'd do so in another PR as this might require further typing changes for mypy. If you want to tackle that as well, feel free to do so :) |
@justusschock agreed that should be a separate PR. So guess this PR is ready. |
* disable non-blocking for mps due to race condition bug * fixed typo * fixed: unknown mps device for non arm systems * Removed unrobust test case * moved _MPS_DEVICES such that we used in apply_func * Resolve circular dependencies * Comment rewording * changed torchElasticEnvironment to a global import * simplified if statement to blocking device type * Added change to CHANGELOG * Update src/pytorch_lightning/utilities/apply_func.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed mypy not detecting casting of device * Moved check into if statement to mainain original behavior Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Justus Schock <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]>
* disable non-blocking for mps due to race condition bug * fixed typo * fixed: unknown mps device for non arm systems * Removed unrobust test case * moved _MPS_DEVICES such that we used in apply_func * Resolve circular dependencies * Comment rewording * changed torchElasticEnvironment to a global import * simplified if statement to blocking device type * Added change to CHANGELOG * Update src/pytorch_lightning/utilities/apply_func.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed mypy not detecting casting of device * Moved check into if statement to mainain original behavior Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Justus Schock <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]>
What does this PR do?
This PR ensures that the race condition bug in Pytorch (pytorch/pytorch#83015) does not affect lightning. As discussed with @justusschock in #13285 this PR disables non-blocking moves to MPS devices since it can result in tensors with different values after the moves to the MPS device. To verify it works now, the test case
test_data_is_not_changed_after_move_to_mps_device
has been added.Fixes #13285
Does your PR introduce any breaking changes? If yes, please list them.
No.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Did you have fun?
Yes, this was my first ever PR 😀. It was quite fun to do. However, since it is my first time, I hope I did everything OK. If not, let me know. I'm happy to learn.