-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Pulsar 4.0.1 image crashed when loading the native SSL library #23717
Comments
The Netty JNI is incorrectly linked: cf43f09c95f1:/tmp$ ldd libnetty_tcnative_linux_aarch_6412030080574647118807.so
/lib/ld-musl-aarch64.so.1 (0xffff9501c000)
librt.so.1 => /lib/ld-musl-aarch64.so.1 (0xffff9501c000)
libdl.so.2 => /lib/ld-musl-aarch64.so.1 (0xffff9501c000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0xffff94dbc000)
libc.so.6 => /lib/ld-musl-aarch64.so.1 (0xffff9501c000)
Error relocating libnetty_tcnative_linux_aarch_6412030080574647118807.so: __getauxval: symbol not found From here,
|
That might be a netty bug. netty/netty#14479 |
Yes, it's a bug of Netty but I'm not sure if it's the same issue. It should be an issue with the JNI library of netty-tcnative-boringssl-static. It can be verified by executing the following command after you start a container: cd /tmp
unzip -q /pulsar/lib/io.netty-netty-tcnative-boringssl-static-*-aarch_64.jar
ldd META-INF/native/libnetty_tcnative_linux_aarch_64.so With 4.0.0, the JAR version is 2.0.66 and the link is good:
However, with 4.0.1, the JAR version is 2.0.69 and the link is broken so it crashes on musl Linux:
|
I tried to reproduce this issue inside an
|
@BewareMyPower We don't have a pure musl base image. glibc in https://github.com/apache/pulsar/tree/master/docker/glibc-package gets added to the Alpine base mixed, making it a mixed musl and glibc environment which is not generally recommended. https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/24647#note_176723 states it clearly: sgerrand/alpine-pkg-glibc#194 (comment) mentions: |
This gist contains a Dockerfile for building a minideb based Pulsar 4.0.1 image by copying the content from the Pulsar 4.0.1 image: https://gist.github.com/lhotari/3ffef8117743f7044e6bbdc3933bc029 . That could be useful for validating whether the problem reproduces without the mixed musl+glibc environment. |
I opened an issue at the upstream side: netty/netty-tcnative#907, which includes a minimum reproduced steps on It seems that the missed |
For Alpine base images, I think that it's necessary to have |
These extra dependencies don't work. Actually for 2.0.66, I only need to install |
I'm trying to build this Dockerfile but it failed: https://gist.github.com/lhotari/3ffef8117743f7044e6bbdc3933bc029 |
Yes, the problem reproduces. I agree that this is a netty-tcnative issue with musl compatibility. btw. I guess instead of 2.0.66 can be loaded with gcompat and libgcc: apk update
apk add gcompat libgcc
wget https://repo1.maven.org/maven2/io/netty/netty-tcnative-boringssl-static/2.0.66.Final/netty-tcnative-boringssl-static-2.0.66.Final-linux-aarch_64.jar
unzip netty-tcnative-boringssl-static-2.0.66.Final-linux-aarch_64.jar
ldd META-INF/native/libnetty_tcnative_linux_aarch_64.so |
I accidentially introduced a typo while updating the version to 4.0.1 . It's fixed now. |
@lhotari Your Dockerfile works so I think we can apply your fix for now. |
Though the image size is much bigger now. |
@BewareMyPower The example Dockerfile is for pulsar-all image. There's a comment how to pass the Pulsar image as a build arg. Passing |
@BewareMyPower It looks like it's necessary to preload gcompat on Alpine (for example with I doubt that this would play nicely with the glibc solution we have in Pulsar. |
@BewareMyPower I found out that adding |
Search before asking
Read release policy
Version
Minimal reproduce step
The script is stuck at waiting for the 8080 port.
What did you expect to see?
The script should succeed
What did you see instead?
The Pulsar process in the container was crashed and the log from
/pulsar/logs
inside the container stopped at:pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Lines 608 to 612 in f571aa1
As you can see, it didn't log the listened port.
Anything else?
hs_err_pid296.log
It crashed when loading the native SSL library:
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: