-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python auto-instrumentation: handle musl based containers #3332
Python auto-instrumentation: handle musl based containers #3332
Conversation
6e778ef
to
c9d17f7
Compare
@@ -0,0 +1,22 @@ | |||
apiVersion: opentelemetry.io/v1alpha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding specific e2e tests is not needed because the e2e-test-app-python
docker image is already based on alpine right? looks like tests are failing on main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like tests are failing on main
Can you elaborate this a bit more?
Adding specific e2e tests is not needed because the e2e-test-app-python docker image is already based on alpine right?
Maybe this was something where I failed. Our idea is, at some point, add verifications to know if the libraries were injected properly and verify they are emitting data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment the e2e-test-app-python
is based on alpine but the python instrumentation image is glibc based. This is a problem because binary extensions are not portable between different C libraries (among other incompatibilities). So this PR builds them and copies one for musl or glibc depending on the configuration.
An example of failure in CI is this:
https://github.com/open-telemetry/opentelemetry-operator/actions/runs/11237912151/job/31241432422?pr=3330#step:8:1330
Where I guess the metrics thread kicks in and the system metrics package fails to load psutil binary module because it has been built on glibc and not musl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW the other thing that should be kept in sync is the Python version of the two images because the ABI changes between python versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment the
e2e-test-app-python
is based on alpine but the python instrumentation image is glibc based. This is a problem because binary extensions are not portable between different C libraries (among other incompatibilities). So this PR builds them and copies one for musl or glibc depending on the configuration.
Didn't notice this when I added the images. As mentioned in the previous comment, the idea is to add real E2E checking if the instrumentation is generating real data. Since we are not checking this, issues like the one you saw are happening. This is something we need to fix with that image.
You can reuse that image (since it is musl
based) for your E2E test. We need to add a new one for glibc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can reuse that image (since it is
musl
based) for your E2E test. We need to add a new one forglibc
.
I'm already using that image in the musl e2e 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing changelog.
@@ -0,0 +1,22 @@ | |||
apiVersion: opentelemetry.io/v1alpha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like tests are failing on main
Can you elaborate this a bit more?
Adding specific e2e tests is not needed because the e2e-test-app-python docker image is already based on alpine right?
Maybe this was something where I failed. Our idea is, at some point, add verifications to know if the libraries were injected properly and verify they are emitting data.
e8640b7
to
b6d469b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you have a look at the e2e test failures?
b6d469b
to
c6ce177
Compare
AFAICS e2e tests are failing because the python autoinstrumentation docker image does not have the musl based installation, is it the case because the image used in tests is not built from git? |
Autoinstrumentation tests use the default image, we don't build all the autoinstrumentation images from source for E2E tests, though we probably should. What I would suggest here:
I know it's a hassle, but it's simpler than all the alternatives. |
Other than fixing the tests though this is breaking backward compatibility with older images and maybe we can avoid that? |
b60ddfe
to
42b4ad4
Compare
Ok finally tested this manually by deploying an operator from this branch, a custom image for the auto-instrumentation and was able to get metrics (that depends on psutil that uses a binary extension) out of a python alpine container with the Also tested that the glibc based is copied to the container if the instrumentation image does not have the musl one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an approver here but this looks great @xrmx. Do we have a place where we can document any of this for users of this functionality?
Thanks for reviewing, will update https://opentelemetry.io/docs/kubernetes/operator/automatic/ and https://opentelemetry.io/docs/zero-code/python/operator/ |
8521ae3
to
bdf97ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing changelog.
It's the first hunk in the diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me, thanks for being patient with our feedback on this PR! One small thing I think is still missing is documenting this new annotation in the README here: https://github.com/open-telemetry/opentelemetry-operator?tab=readme-ov-file#opentelemetry-auto-instrumentation-injection.
Build and and inject musl based python auto-instrumentation if proper annotation is configured: instrumentation.opentelemetry.io/otel-python-platform: "musl" Refs open-telemetry#2264
cf7c491
to
727b82b
Compare
Updated README and rebased, thanks! I think you have been more patient with me than the other way around 😅 |
Description:
Build and and inject musl based python auto-instrumentation if proper annotation is configured:
instrumentation.opentelemetry.io/otel-python-platform: "musl"
This takes a different approach that the stale PR at #2266:
Link to tracking Issue(s):
Testing: unit tests and e2e are green. Tested locally on minikube that by deploying an operator from this branch, a custom image for the auto-instrumentation was able to get metrics (that depends on psutil that uses a binary extension) out of a python alpine container with the
instrumentation.opentelemetry.io/otel-python-platform: "musl"
annotation and a stacktrace without it.Also tested that the glibc based is copied to the container if the instrumentation image does not have the musl one.
Documentation: Will add docs to opentelemetry.io