Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python auto-instrumentation: handle musl based containers #3332

Merged
merged 5 commits into from
Nov 4, 2024

Conversation

xrmx
Copy link
Contributor

@xrmx xrmx commented Oct 7, 2024

Description:

Build and and inject musl based python auto-instrumentation if proper annotation is configured:

instrumentation.opentelemetry.io/otel-python-platform: "musl"

This takes a different approach that the stale PR at #2266:

  • does not change directory where python sdk is installed in the docker container for default distribution (glibc)

Link to tracking Issue(s):

Testing: unit tests and e2e are green. Tested locally on minikube that by deploying an operator from this branch, a custom image for the auto-instrumentation was able to get metrics (that depends on psutil that uses a binary extension) out of a python alpine container with the instrumentation.opentelemetry.io/otel-python-platform: "musl" annotation and a stacktrace without it.
Also tested that the glibc based is copied to the container if the instrumentation image does not have the musl one.

Documentation: Will add docs to opentelemetry.io

@xrmx xrmx requested a review from a team as a code owner October 7, 2024 13:55
@xrmx xrmx force-pushed the python-multi-libc-distribution-2 branch 2 times, most recently from 6e778ef to c9d17f7 Compare October 7, 2024 14:03
@@ -0,0 +1,22 @@
apiVersion: opentelemetry.io/v1alpha1
Copy link
Contributor Author

@xrmx xrmx Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding specific e2e tests is not needed because the e2e-test-app-python docker image is already based on alpine right? looks like tests are failing on main

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like tests are failing on main

Can you elaborate this a bit more?

Adding specific e2e tests is not needed because the e2e-test-app-python docker image is already based on alpine right?

Maybe this was something where I failed. Our idea is, at some point, add verifications to know if the libraries were injected properly and verify they are emitting data.

Copy link
Contributor Author

@xrmx xrmx Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the e2e-test-app-python is based on alpine but the python instrumentation image is glibc based. This is a problem because binary extensions are not portable between different C libraries (among other incompatibilities). So this PR builds them and copies one for musl or glibc depending on the configuration.

An example of failure in CI is this:
https://github.com/open-telemetry/opentelemetry-operator/actions/runs/11237912151/job/31241432422?pr=3330#step:8:1330

Where I guess the metrics thread kicks in and the system metrics package fails to load psutil binary module because it has been built on glibc and not musl.

Copy link
Contributor Author

@xrmx xrmx Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW the other thing that should be kept in sync is the Python version of the two images because the ABI changes between python versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the e2e-test-app-python is based on alpine but the python instrumentation image is glibc based. This is a problem because binary extensions are not portable between different C libraries (among other incompatibilities). So this PR builds them and copies one for musl or glibc depending on the configuration.

Didn't notice this when I added the images. As mentioned in the previous comment, the idea is to add real E2E checking if the instrumentation is generating real data. Since we are not checking this, issues like the one you saw are happening. This is something we need to fix with that image.

You can reuse that image (since it is musl based) for your E2E test. We need to add a new one for glibc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can reuse that image (since it is musl based) for your E2E test. We need to add a new one for glibc.

I'm already using that image in the musl e2e 👍

Copy link
Contributor

@iblancasa iblancasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing changelog.

@@ -0,0 +1,22 @@
apiVersion: opentelemetry.io/v1alpha1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like tests are failing on main

Can you elaborate this a bit more?

Adding specific e2e tests is not needed because the e2e-test-app-python docker image is already based on alpine right?

Maybe this was something where I failed. Our idea is, at some point, add verifications to know if the libraries were injected properly and verify they are emitting data.

@xrmx xrmx force-pushed the python-multi-libc-distribution-2 branch from e8640b7 to b6d469b Compare October 10, 2024 13:17
Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you have a look at the e2e test failures?

pkg/instrumentation/annotation.go Outdated Show resolved Hide resolved
@xrmx xrmx force-pushed the python-multi-libc-distribution-2 branch from b6d469b to c6ce177 Compare October 15, 2024 09:59
@xrmx
Copy link
Contributor Author

xrmx commented Oct 15, 2024

AFAICS e2e tests are failing because the python autoinstrumentation docker image does not have the musl based installation, is it the case because the image used in tests is not built from git?

@swiatekm
Copy link
Contributor

AFAICS e2e tests are failing because the python autoinstrumentation docker image does not have the musl based installation, is it the case because the image used in tests is not built from git?

Autoinstrumentation tests use the default image, we don't build all the autoinstrumentation images from source for E2E tests, though we probably should.

What I would suggest here:

  1. Verify locally that your current tests pass with the new image
  2. Modify tests in this PR to only check if the container manifest is correct (and not that it's running)
  3. Merge this PR
  4. Operator release happens
  5. Open a new PR, changing the tests back to their current state

I know it's a hassle, but it's simpler than all the alternatives.

@xrmx
Copy link
Contributor Author

xrmx commented Oct 15, 2024

AFAICS e2e tests are failing because the python autoinstrumentation docker image does not have the musl based installation, is it the case because the image used in tests is not built from git?

Autoinstrumentation tests use the default image, we don't build all the autoinstrumentation images from source for E2E tests, though we probably should.

What I would suggest here:

1. Verify locally that your current tests pass with the new image

2. Modify tests in this PR to only check if the container manifest is correct (and not that it's running)

3. Merge this PR

4. Operator release happens

5. Open a new PR, changing the tests back to their current state

I know it's a hassle, but it's simpler than all the alternatives.

Other than fixing the tests though this is breaking backward compatibility with older images and maybe we can avoid that?

@xrmx xrmx marked this pull request as draft October 17, 2024 12:21
@xrmx xrmx force-pushed the python-multi-libc-distribution-2 branch from b60ddfe to 42b4ad4 Compare October 18, 2024 07:30
@xrmx xrmx marked this pull request as ready for review October 18, 2024 09:19
@xrmx
Copy link
Contributor Author

xrmx commented Oct 18, 2024

Ok finally tested this manually by deploying an operator from this branch, a custom image for the auto-instrumentation and was able to get metrics (that depends on psutil that uses a binary extension) out of a python alpine container with the instrumentation.opentelemetry.io/otel-python-platform: "musl" annotation and a stacktrace without it.

Also tested that the glibc based is copied to the container if the instrumentation image does not have the musl one.

@xrmx xrmx requested review from swiatekm and iblancasa October 18, 2024 09:24
Copy link
Member

@pmcollins pmcollins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an approver here but this looks great @xrmx. Do we have a place where we can document any of this for users of this functionality?

@xrmx
Copy link
Contributor Author

xrmx commented Oct 23, 2024

I'm not an approver here but this looks great @xrmx. Do we have a place where we can document any of this for users of this functionality?

Thanks for reviewing, will update https://opentelemetry.io/docs/kubernetes/operator/automatic/ and https://opentelemetry.io/docs/zero-code/python/operator/

@xrmx xrmx marked this pull request as draft October 23, 2024 12:45
@xrmx xrmx force-pushed the python-multi-libc-distribution-2 branch 3 times, most recently from 8521ae3 to bdf97ef Compare October 29, 2024 08:11
@xrmx xrmx marked this pull request as ready for review October 29, 2024 08:12
@xrmx xrmx requested a review from swiatekm October 29, 2024 08:23
Copy link
Contributor

@iblancasa iblancasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing changelog.

@xrmx
Copy link
Contributor Author

xrmx commented Oct 29, 2024

Missing changelog.

It's the first hunk in the diff

Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me, thanks for being patient with our feedback on this PR! One small thing I think is still missing is documenting this new annotation in the README here: https://github.com/open-telemetry/opentelemetry-operator?tab=readme-ov-file#opentelemetry-auto-instrumentation-injection.

xrmx added 5 commits November 4, 2024 13:58
Build and and inject musl based python auto-instrumentation if proper
annotation is configured:

instrumentation.opentelemetry.io/otel-python-platform: "musl"

Refs open-telemetry#2264
@xrmx xrmx force-pushed the python-multi-libc-distribution-2 branch from cf7c491 to 727b82b Compare November 4, 2024 13:02
@xrmx
Copy link
Contributor Author

xrmx commented Nov 4, 2024

The changes look good to me, thanks for being patient with our feedback on this PR! One small thing I think is still missing is documenting this new annotation in the README here: https://github.com/open-telemetry/opentelemetry-operator?tab=readme-ov-file#opentelemetry-auto-instrumentation-injection.

Updated README and rebased, thanks!

I think you have been more patient with me than the other way around 😅

@swiatekm swiatekm enabled auto-merge (squash) November 4, 2024 13:06
@swiatekm swiatekm merged commit 2389f94 into open-telemetry:main Nov 4, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python autoinstrumentation for musl libc based application containers
4 participants