Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataLoader] Locking lower ranks seed recepients #81071

Closed

Conversation

VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin commented Jul 7, 2022

Exit seed receiving section only when all ranks received seed, otherwise we are at risk that current rank
will reach same section of the code again while rank zero still in the previous iteration

Stack from ghstack (oldest at bottom):

Fixes: #80845

Differential Revision: D37702557

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 7, 2022

🔗 Helpful links

❌ 1 New Failures

As of commit 8761195 (more details on the Dr. CI page):

Expand to see more
  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build Lint / lintrunner (1/1)

Step: "Run lintrunner on all files" (full log | diagnosis details | 🔁 rerun)

2022-07-07T23:04:23.9410282Z ##[error]This line has trailing spaces; please remove them.
2022-07-07T23:04:23.7817797Z ##[group]Run # Use jq to massage the JSON lint output into GitHub Actions workflow commands.
2022-07-07T23:04:23.7818186Z �[36;1m# Use jq to massage the JSON lint output into GitHub Actions workflow commands.�[0m
2022-07-07T23:04:23.7818461Z �[36;1mjq --raw-output \�[0m
2022-07-07T23:04:23.7818851Z �[36;1m  '"::\(if .severity == "advice" or .severity == "disabled" then "warning" else .severity end) file=\(.path),line=\(.line),col=\(.char),title=\(.code) \(.name)::" + (.description | gsub("\\n"; "%0A"))' \�[0m
2022-07-07T23:04:23.7819194Z �[36;1m  lint.json�[0m
2022-07-07T23:04:23.7864813Z shell: /usr/bin/bash -e {0}
2022-07-07T23:04:23.7865010Z env:
2022-07-07T23:04:23.7865244Z   pythonLocation: /opt/hostedtoolcache/Python/3.8.13/x64
2022-07-07T23:04:23.7865542Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.8.13/x64/lib
2022-07-07T23:04:23.7865786Z ##[endgroup]
2022-07-07T23:04:23.9410282Z ##[error]This line has trailing spaces; please remove them.
2022-07-07T23:04:23.9449542Z Post job cleanup.
2022-07-07T23:04:23.9481919Z Post job cleanup.
2022-07-07T23:04:24.0551195Z [command]/usr/bin/git version
2022-07-07T23:04:24.0600876Z git version 2.36.1
2022-07-07T23:04:24.0647614Z Temporarily overriding HOME='/home/runner/actions-runner/_work/_temp/a7821e5d-a730-4069-a436-2f96ffb5195e' before making global git config changes
2022-07-07T23:04:24.0648216Z Adding repository directory to the temporary git global config as a safe directory
2022-07-07T23:04:24.0653818Z [command]/usr/bin/git config --global --add safe.directory /home/runner/actions-runner/_work/pytorch/pytorch
2022-07-07T23:04:24.0699101Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2022-07-07T23:04:24.0737232Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :
2022-07-07T23:04:24.0991639Z Entering 'android/libs/fbjni'

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

VitalyFedyunin added a commit that referenced this pull request Jul 7, 2022
ghstack-source-id: 32ab45be78eb2695600c5b46a7a3286340ba5bc8
Pull Request resolved: #81071
@VitalyFedyunin
Copy link
Contributor Author

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! LGTM

Exit seed receiving section only when all ranks received seed, otherwise we are at risk that current rank 
will reach same section of the code again while rank zero still in the previous iteration


Fixes: #80845

Differential Revision: [D37702557](https://our.internmc.facebook.com/intern/diff/D37702557)

[ghstack-poisoned]
VitalyFedyunin added a commit that referenced this pull request Jul 8, 2022
ghstack-source-id: aceebd9b63eb9875f7d8f2313372fb7a4e9dd9d1
Pull Request resolved: #81071
@VitalyFedyunin
Copy link
Contributor Author

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Merge failed due to This PR has internal changes and must be landed via Phabricator
Raised by https://github.com/pytorch/pytorch/actions/runs/2636979414

@VitalyFedyunin
Copy link
Contributor Author

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@VitalyFedyunin
Copy link
Contributor Author

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here

@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2022

Hey @VitalyFedyunin.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@facebook-github-bot facebook-github-bot deleted the gh/VitalyFedyunin/183/head branch July 12, 2022 14:17
facebook-github-bot pushed a commit that referenced this pull request Jul 12, 2022
Summary:
Exit seed receiving section only when all ranks received seed, otherwise we are at risk that current rank
will reach same section of the code again while rank zero still in the previous iteration

Fixes: #80845

Pull Request resolved: #81071
Approved by: https://github.com/msaroufim, https://github.com/ejguan

Test Plan:
contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/e9b3bc2eadb8ffe10c002abcd5a34a5b7d36f390

Original Phabricator Test Plan:
Imported from OSS

Reviewed By: mehtanirav, ejguan

Differential Revision: D37702557

Pulled By: VitalyFedyunin

fbshipit-source-id: 51dd950e1bfc2c984a4ddbe6481e225023b0a202
@ejguan ejguan added this to the 1.12.1 milestone Jul 12, 2022
atalman pushed a commit to atalman/pytorch that referenced this pull request Jul 21, 2022
…orch#81071)

Summary:
Exit seed receiving section only when all ranks received seed, otherwise we are at risk that current rank
will reach same section of the code again while rank zero still in the previous iteration

Fixes: pytorch#80845

Pull Request resolved: pytorch#81071
Approved by: https://github.com/msaroufim, https://github.com/ejguan

Test Plan:
contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/e9b3bc2eadb8ffe10c002abcd5a34a5b7d36f390

Original Phabricator Test Plan:
Imported from OSS

Reviewed By: mehtanirav, ejguan

Differential Revision: D37702557

Pulled By: VitalyFedyunin

fbshipit-source-id: 51dd950e1bfc2c984a4ddbe6481e225023b0a202
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants