Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caused by: java.lang.UnsatisfiedLinkError: #183

Closed
CensorKo opened this issue Nov 2, 2021 · 6 comments
Closed

Caused by: java.lang.UnsatisfiedLinkError: #183

CensorKo opened this issue Nov 2, 2021 · 6 comments

Comments

@CensorKo
Copy link

CensorKo commented Nov 2, 2021

@zachgk @frankfliu @lanking520 @stu1130 @roywei

We deploy yolov5 torchscript model on aws inferentia instance. But DJL can't load libneuron_op.so file on startup.

First, libneuron_op.so exist in OS And PYTORCH_EXTRA_LIBRARY_PATH environment variable is set.

Caused by: java.lang.UnsatisfiedLinkError: /home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/lib/libneuron_op.so: /home/ubuntu/anaconda3/envs/aws_neuron_pytorch_p36/lib/python3.6/site-packages/torch_neuron/lib/libneuron_op.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKSs
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1934)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1817)
at java.lang.Runtime.load0(Runtime.java:810)
at java.lang.System.load(System.java:1088)
at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:72)
... 44 more

@CensorKo
Copy link
Author

CensorKo commented Nov 2, 2021

Another question, It seems that libneuron_op.so was deleted in torch-neuron==1.8.1 due to the neuron-rtd was removed. So how to use DJL with aws-neuron-dkms in Neuron Runtime 2.x (libnrt.so)? Did you have samples on it?

Confusing...

@frankfliu
Copy link
Contributor

frankfliu commented Nov 2, 2021

@CensorKao a few thing you need to check:

  1. The example currently only work with DJL 0.12.0 with torch-neuron 1.8.1
  2. You have to use pytorch precxx11 version: https://github.com/deepjavalibrary/djl-demo/blob/master/aws/inferentia/build.gradle#L21
  3. You have to install neuron sdk <= 1.15 and use old neuron runtime.

We are working 0.14.0 to make DJL work with 1.16.0 neuron sdk. If you want, you can try our 0.14.0-SNAPSHOT version. Documentation is still WIP.

@CensorKo
Copy link
Author

CensorKo commented Nov 3, 2021

@CensorKao a few thing you need to check:

  1. The example currently only work with DJL 0.12.0 with torch-neuron 1.8.1
  2. You have to use pytorch precxx11 version: https://github.com/deepjavalibrary/djl-demo/blob/master/aws/inferentia/build.gradle#L21
  3. You have to install neuron sdk <= 1.15 and use old neuron runtime.

We are working 0.14.0 to make DJL work with 1.16.0 neuron sdk. If you want, you can try our 0.14.0-SNAPSHOT version. Documentation is still WIP.

Thanks, but how to check neuron sdk version? I have checked all the document only get neuron-rtd version: 1.5.0.0
Should i guess neuron-rtd 1.5.0.0 equal to neuron sdk 1.15?
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-runtime/v1/nrt_start.html

@CensorKo
Copy link
Author

CensorKo commented Nov 3, 2021

@frankfliu

Our model was trained by yolov5 and already exported to torchscript.pt files. Now we want to run it on inferentia chips instance. So how to trace our yolov5 torchscript.pt model by using trace.py or did we need to trace it before running?

https://github.com/deepjavalibrary/djl-demo/blob/ce41d826890b768aa5d86ebec80efa46571ff12d/aws/inferentia/trace.py

@frankfliu
Copy link
Contributor

@CensorKao
I just created a demo for Huggingface model: #184

@frankfliu
Copy link
Contributor

@frankfliu

Our model was trained by yolov5 and already exported to torchscript.pt files. Now we want to run it on inferentia chips instance. So how to trace our yolov5 torchscript.pt model by using trace.py or did we need to trace it before running?

https://github.com/deepjavalibrary/djl-demo/blob/ce41d826890b768aa5d86ebec80efa46571ff12d/aws/inferentia/trace.py

You have to trace it use neuron-cc, regular torchscript won't work with inferentia.

@CensorKo CensorKo closed this as completed Nov 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants