-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency detected by ld.so: dl-version.c: 224: _dl_check_map_versions: Assertion `needed != NULL' failed! #9754
Comments
I have searched the internet and this appears to be relevant. |
Other results suggest that it may be due to a missing library. For that you would have to use |
It was caused by the patchelf tool we use. The warnings generated by ldd doesn't impact use, except you can't use ldd with it. |
I'm also running into the same issue with onnxruntime-training (1.9.0) and onnxruntime-gpu (1.9.0) wheels installed from PyPI, trying to train a simple model using the CUDA EP. @snnn, the suggested fix above is not clear to me; can you elaborate on it? |
First, onnxruntime python packages, "onnxruntime" and "onnxruntime-gpu", follow manylinux2014(pep-0599 ) standard. But the gpu one, onnxruntime-gpu, isn't fully compliant. The PEP 599 policy says: "The wheel's binary executables or shared objects may not link against externally-provided libraries except those in the following list"
But we need CUDA. And CUDA isn't in the list. BTW, if you run ldd with onnxruntime's cpu only package, "onnxruntime", you won't see the error. The policy was designed as any external dependency should be packed into the wheel file. However, we can't. Because,
So we did a dirty hack. Before pack the wheel, we patch the so file to pretend it doesn't depend on CUDA. To cheat on manylinux's auditwheel tool. Then we pack the wheel and manually load the CUDA libraries. The error message you saw is caused the tool for patching *.so files: patchelf. If we don't use the tool, we won't have this issue. Alternatively, we could modify the policy. Patch the auditwheel tool, add a custom policy file to whitelist CUDA libraries. The file is: https://github.com/pypa/auditwheel/blob/main/src/auditwheel/policy/manylinux-policy.json . See #144 for more information. (Hi @adk9, the above answer is only for onnxruntime inference packages. onnxruntime-training package is built in a special way that I'm not familiar. ) |
Hi @snnn thank you for the detailed explanation of the problem. If I understand correctly, this is basically a problem that stems from not being able to specify CUDA as a pip dependency? I am afraid that as a small potato, not familiar with the deeper workings of pip and auditwheel, I am unable to implement the fix you are describing. Could the problem be fixed by manually installing a compatible version of CUDA? Otherwise, could you give a more line-by line set of instruction on what files to edit and what actions to take after the edit (I take it I need to build the wheel after editing the rules? I have never done that before)? |
Yes. Then ldd will still not work, but the onnxruntime python package should be good. |
In my specific case, after looking at the onnx runtime requirements again, I noticed that I might be missing |
@GuillaumeTong @snnn I am facing the same issue. My cuda version is 10.2 followed the same installation instructions for cudnn for cuda 10.2 still getting the same issue. My OS is Ubuntu 18 on AWS with T4 GPU. What could be the reason?. I also have CUDA 11.6 on my system as well. This problem is arriving in my case with onnxruntime-training installation. Below is the code on execution of which the error is occurring: from onnxruntime import OrtValue works for: |
I'm not familiar with the onnxruntime-training package. @askhade, could you please help? I think there might be some code in pytorch ran this "ldd" command. Do you know how to reproduce it? |
It also solved my issue when upgrading the CUDA to the compatible versions with the onnxruntime-gpu. Thanks a lot. |
I'm in a bit of a similar pickle here, though it might be one outside the scope of this issue or project. Environment includes Ubuntu 18.04, CUDA 11.4, CUDNN 8.2.4, Python 3.6, onnxruntime-gpu coming from pip. I'm packing up the project as a onedir executable using PyInstaller. |
My apologies for the ramble. Desperation tends to do that. I had resolved the issue on my own. It turns out that PyInstaller was not including all of the necessary CUDA libraries. Including them manually allowed onnxruntime to start up (and then crash when it couldn't find the I will say though that it is incredibly frustrating to have spent the time on what ended up being a fairly simple issue. I understand that the import hack is done to avoid the eyes of the auditor, but this same hack made it much more difficult to realize that it was just a matter of a missing dependency. I know I'm barking up the wrong tree here since I could've just not used Python, but this type of error would've happened much sooner in the pipeline and likely had a more useful error message in any compiled language. |
If you are running with
For cuda-11.4 and libnvinfer 8.2.5.1:
For cuda-11.6 and libinfer 8.4.3 (tested also with cuda-11.8):
Hope it helps. |
I hit the similar issue. |
Can you provide bit more help about how to patch the auditwheel tool and create a custom policy file to whitelist CUDA libraries? What are the steps? I have the same issue, I downgraded my cuda version from 11.8 to 11.6 but the problem still remains. I am trying to inference some images, but it works only for cpu and unfortunately not for cuda. |
I have the same issue, I'm using docker env |
Stumbled across this issue. FWIW, in newer auditwheel there is an option to exclude shared objects that will be provided in a different manner. This is the PR that added the option
|
I will need to rework the PR #1282 . |
### Description 1. Delete Prefast tasks (#17522) 2. Disable yum update (#17551) 3. Avoid calling patchelf (#17365 and #17562) we that we can validate the above fix The main problem I'm trying to solve is: our GPU package depends on both CUDA 11.x and CUDA 12.x . However, it's not easy to see the information because ldd doesn't work with the shared libraries we generate(see issue #9754) . So the patchelf change are useful for me to validate the "Disabling yum update" was successful. As you can see we call "yum update" from multiple places. Without some kind of validation it's hard to say if I have covered all of them. The Prefast change is needed because I'm going to update the VM images in the next a few weeks. In case of we need to publish a patch release after that. ### Motivation and Context Without this fix we will mix using CUDA 11.x and CUDA 12.x. And it will crash every time when we use TensorRT.
### Description Resolve microsoft#9754
Describe the bug
use onnxruntime-gpu inference my own onnx model. It works well when I use data in cpu device.
But there is a error throwed when I use data in gpu device.
It works well by this code:
ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(img.numpy())
It will fail by this:
ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(img_lq.numpy(), device_type="cuda", device_id=0)
The error is:
Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.
System information
To Reproduce
Expected behavior
Any help to use gpu version tor inference onnx model?
Screenshots
The text was updated successfully, but these errors were encountered: