Fix build #1479

yinghai · 2022-11-24T22:40:40Z

How to debug

Follow https://circleci.com/docs/ssh-access-jobs/ to get into the host where test failed (e.g. https://app.circleci.com/pipelines/github/pytorch/TensorRT/1461/workflows/877d58af-c765-48df-b6c0-e6328723a7fd/jobs/6968)
Repeat the failed command in the circleci host: python3 -c "import torch_tensorrt; torch_tensorrt.dump_build_info()" and see that issue can be reproduced.
Repeat with LD_DEBUG=libs python3 -c "import torch_tensorrt; torch_tensorrt.dump_build_info()" and locate the torch libraries at /opt/circleci/.pyenv/versions/3.9.4/lib/python3.9/site-packages/torch/lib/libc10_cuda.so.
Do nm -C /opt/circleci/.pyenv/versions/3.9.4/lib/python3.9/site-packages/torch/lib/libc10_cuda.so | grep CUDACachingAllocator |grep T and noticed that it seems weird as it doesn't contain the symbol c10::cuda::CUDACachingAllocator::allocator. It seems that the libs are not from torch 1.14 nightly.
Check the torch version python3 -c "import torch; print(torch.__version__)" and it showed 1.13.0+cu117. This is very weird as we are supposed to install 1.14.0+cu116.
Check the CI pipeline at noticed that during the Install torch-tensorrt stage, we are actually uninstalling the previously installed torch 1.14

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 0.14.0.dev20221114+cu116 requires torch==1.14.0.dev20221114, but you have torch 1.13.0 which is incompatible.

Check py/setup.py and find that we are forcing torch version to be "torch>=1.13.0.dev0,<1.14.0". Fix that.

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

yinghai · 2022-11-27T04:53:31Z

@narendasan //tests/core/conversion/converters:test_einsum failed. Please help check. Thanks.

frank-wei

Thanks @yinghai for your time on deep diving in this issue. It helps us a lot on unblocking the 1.3 release!

frank-wei · 2022-11-27T04:54:58Z

core/BUILD

-        ":use_pre_cxx11_abi": ["@libtorch_pre_cxx11_abi//:libtorch"],
-        "//conditions:default": ["@libtorch//:libtorch"],
+        ":use_pre_cxx11_abi": ["@libtorch_pre_cxx11_abi//:libtorch", "@libtorch_pre_cxx11_abi//:c10_cuda"],
+        "//conditions:default": ["@libtorch//:libtorch", "@libtorch//:c10_cuda"],


Did you forget to mention in the diagnosis that we also need to link with c10_cuda? :-)

We probably don't need that. But it doesn't hurt.

Fix build

a925039

facebook-github-bot added the cla signed label Nov 24, 2022

github-actions bot added the component: api [C++] Issues re: C++ API label Nov 24, 2022

github-actions bot requested a review from narendasan November 24, 2022 22:41

yinghai requested a review from frank-wei November 24, 2022 22:42

different nightly torch

f04424b

github-actions bot added the component: api [Python] Issues re: Python API label Nov 24, 2022

fix lint

2701dd7

github-actions bot added component: core Issues re: The core compiler component: runtime labels Nov 24, 2022

Yinghai Lu added 2 commits November 24, 2022 15:24

c10_cuda

c195ae7

update WORKSPACE

d972267

github-actions bot added the component: build system Issues re: Build system label Nov 27, 2022

Yinghai Lu added 2 commits November 26, 2022 19:44

print torch version

c580f9e

Update setup.py requirement

e951062

yinghai mentioned this pull request Nov 27, 2022

[FX] Changes done internally at Facebook #1456

Merged

7 tasks

frank-wei approved these changes Nov 27, 2022

View reviewed changes

fix TRTModuleNext

92e9a07

yinghai mentioned this pull request Nov 27, 2022

🐛 [Bug] No matching distribution found for torch==1.13.0.dev20220921+cu116 #1478

Closed

frank-wei mentioned this pull request Nov 27, 2022

make padding layer converter more efficient #1470

Merged

7 tasks

yinghai merged commit e3cb32d into pytorch:master Nov 27, 2022

yinghai deleted the fix branch November 27, 2022 05:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix build #1479

Fix build #1479

yinghai commented Nov 24, 2022 •

edited

Loading

yinghai commented Nov 27, 2022

frank-wei left a comment

frank-wei Nov 27, 2022

yinghai Nov 27, 2022

Fix build #1479

Fix build #1479

Conversation

yinghai commented Nov 24, 2022 • edited Loading

How to debug

Type of change

Checklist:

yinghai commented Nov 27, 2022

frank-wei left a comment

Choose a reason for hiding this comment

frank-wei Nov 27, 2022

Choose a reason for hiding this comment

yinghai Nov 27, 2022

Choose a reason for hiding this comment

yinghai commented Nov 24, 2022 •

edited

Loading