-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM64 CentOS7 compatibility issues with djl/pytorch due to glibc requirements #2563
Comments
http://djl.ai/engines/pytorch/pytorch-engine/#for-pre-cxx11-build Thanks. |
Hi @ylwu-amzn Could you help check if this is a test time only issue, or runtime issue when deploying actual models as well? Thanks! |
After extensive debugging with the DJL team and ML team , we have discovered a potential workaround: DJL version 0.28 allows for override of libstdc++ path with
The solution is to have DJL lock to 0.28 but tokenizer lock to 0.21:
Thanks. |
The cons of the workaround is we will use mixed version which adds maintenance effort. Considering this issue exists for a long time but no one reports issue, and Centos 7 will be deprecated, would suggest not add such workaround . From @peterzhuamazon , the CentOS7 X64 passed, just ARM64 failed. |
We are having issues in ml-commons on arm64, where a lib related to pytorch is requiring glibc >= 2.18
https://ci.opensearch.org/ci/dbc/integ-test/2.15.0/9970/linux/arm64/tar/test-results/8297/integ-test/neural-search/without-security/local-cluster-logs/id-1/stderr.txt
https://ci.opensearch.org/ci/dbc/integ-test/2.15.0/9970/linux/arm64/tar/test-results/8297/integ-test/neural-search/without-security/stderr.txt
Note that we are using CentOS7 to build and test OS plugins, which has glibc 2.17 after all.
This issue would cause the cluster to crash, resulted in integTest suck in the middle with connection reset.
This has impacted ml and ml related plugins such as ml/neural/flowframework to fail their tests.
And this has been an issue on arm64 TAR since 2.12 as we trace the logs all the way back.
CentOS7 is going to deprecate on 06/30 and this shouldnt be a problem for AL2 as AL2 has gblic 2.28.
We will switch to AL2 on 2.16 anyway due to k-NN. opensearch-project/opensearch-build#4379
Note: This has affected ML, Flow-Framework, Neural-Search.
Thanks.
The text was updated successfully, but these errors were encountered: