Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump onnx from 1.16.1 to 1.17.0 in /.ci/docker #2

Closed
wants to merge 1 commit into from

Conversation

dependabot[bot]
Copy link

@dependabot dependabot bot commented on behalf of github Oct 23, 2024

Bumps onnx from 1.16.1 to 1.17.0.

Release notes

Sourced from onnx's releases.

v1.17.0

ONNX v1.17.0 is now available with exciting new features! We would like to thank everyone who contributed to this release! Please visit onnx.ai to learn more about ONNX and associated projects.

Key Updates

ai.onnx Opset 22

Python Changes

  • Support for numpy >= 2.0

Bug fixes and infrastructure improvements

  • Fix Check URLs errors 5972
  • Use CMAKE_PREFIX_PATH in finding libprotobuf 5975
  • Bump main VERSION_NUMBER to 1.17.0 5968
  • Fix source and pip tar.gz builds on s390x systems 5984
  • Fix unique_name 5992
  • Fix SegFault bug in shape inference 5990
  • Fix onnx.compose when connecting subgraphs 5991
  • Fix conversion from split 11 to split 18 6020
  • Update error messages for NegativeLogLikelihoodLoss inference function 6021
  • Generalize input/output number check in shape inference 6005
  • Replace rank inference with shape inference for Einsum op 6010
  • build from source instruction with latest cmake change 6038
  • Handle OneHot's depth value during shape inference 5963
  • Not to install cmake in pyproject.toml on Windows 6045
  • fix a skipped shape infer code 6049
  • Include the ".onnxtext" extension in supported serialization format 6051
  • Allow ReferenceEvaluator to return intermediate results 6066
  • Fix 1 typo in numpy_helper.py 6041
  • Remove benchmarking code 6076
  • Prevent crash on import after GCC 8 builds 6048
  • Check graph outputs are defined 6083
  • Enable additional ruff rules 6032
  • Add missing shape inference check for DequantizeLinear 6080
  • Add bfloat16 to all relevant ops 6099
  • fix(ci): install python dependencies with --only-binary :all: in manylinux 6120
  • fix: install google-re2 with --only-binary option 6129
  • Specify axis parameter for DequantizeLinear when input rank is 1 6095
  • Pin onnxruntime to 1.17.3 for release CIs 6143
  • Fix INT4 TensorProto byte size is 5x larger than expected with negative values 6161
  • Mitigate tarball directory traversal risks 6164
  • Fix reference implementation for ScatterND with 4D tensors 6174
  • Addition of group > 1 in test and in backend for ConvTranspose 6175
  • Support for bfloat16 for binary, unary operators in reference implementation 6166
  • Refactor windows workflow to work on standard windows 6190
  • Fix a few crashes while running shape inference 6195
  • Update onnx to work with numpy>=2.0 6196
  • Use sets to improve performance of dfs search 6213

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    You can disable automated security fix PRs for this repo from the Security Alerts page.

Bumps [onnx](https://github.com/onnx/onnx) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/onnx/onnx/releases)
- [Changelog](https://github.com/onnx/onnx/blob/main/docs/Changelog-ml.md)
- [Commits](onnx/onnx@v1.16.1...v1.17.0)

---
updated-dependencies:
- dependency-name: onnx
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Oct 23, 2024
jyono pushed a commit that referenced this pull request Nov 12, 2024
…ytorch#139659)

### Motivation
Today, watchdog only reports that it found a collective timeout:
```
[rank1]:[E1104 14:02:18.767594328 ProcessGroupNCCL.cpp:688] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=ALLREDUCE, NumelIn=200, NumelOut=200, Timeout(ms)=5000) ran for 5096 milliseconds before timing out.
```
While this is nice, it is hard to associate the error with user's program or library stack.

### This PR
This PR gives watchdog the ability to report the call-time stack of the collective, so that it would be easier to track the error back to the program's behavior.

The call-time stack was recorded by Flight Recorder with minimal overhead (for details, please read this [doc](https://dev-discuss.pytorch.org/t/fast-combined-c-python-torchscript-inductor-tracebacks/1158) written by @zdevito ). In `ProcessGroupNCCL`, we are only tracking / reporting the python part so that it fits most PyTorch users.

### Demo
[stack_demo.py](https://gist.github.com/kwen2501/6758e18d305d67fc6f3f926217825c09).

```
TORCH_NCCL_TRACE_BUFFER_SIZE=100 torchrun --nproc-per-node 2 stack_demo.py
```
`TORCH_NCCL_TRACE_BUFFER_SIZE` is for turning on the Flight Recorder.

Output:
```
[rank0]:[E1104 14:19:27.591610653 ProcessGroupNCCL.cpp:695] Stack trace of the timedout collective operation:
#0 all_reduce from /data/users/kw2501/pytorch/torch/distributed/distributed_c10d.py:2696
#1 wrapper from /data/users/kw2501/pytorch/torch/distributed/c10d_logger.py:83
#2 bar from /data/users/kw2501/sync_async/repro.py:15
#3 foo from /data/users/kw2501/sync_async/repro.py:24
pytorch#4 main from /data/users/kw2501/sync_async/repro.py:34
pytorch#5 <module> from /data/users/kw2501/sync_async/repro.py:40

[rank1]:[E1104 14:19:27.771430164 ProcessGroupNCCL.cpp:695] Stack trace of the timedout collective operation:
#0 all_gather_into_tensor from /data/users/kw2501/pytorch/torch/distributed/distributed_c10d.py:3630
#1 wrapper from /data/users/kw2501/pytorch/torch/distributed/c10d_logger.py:83
#2 baz from /data/users/kw2501/sync_async/repro.py:20
#3 foo from /data/users/kw2501/sync_async/repro.py:26
pytorch#4 main from /data/users/kw2501/sync_async/repro.py:34
pytorch#5 <module> from /data/users/kw2501/sync_async/repro.py:40
```

From the log above, we can tell that `bar()` and `baz()` are the places where the two ranks divert.

Pull Request resolved: pytorch#139659
Approved by: https://github.com/wconstab, https://github.com/fduwjj
Copy link
Author

dependabot bot commented on behalf of github Nov 12, 2024

Looks like onnx is up-to-date now, so this is no longer needed.

@dependabot dependabot bot closed this Nov 12, 2024
@dependabot dependabot bot deleted the dependabot/pip/dot-ci/docker/onnx-1.17.0 branch November 12, 2024 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants