Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auto-detection of container information to resource if cgroupv2 is used #6694

Closed
svrnm opened this issue Sep 14, 2022 · 10 comments · Fixed by #7167
Closed

Add auto-detection of container information to resource if cgroupv2 is used #6694

svrnm opened this issue Sep 14, 2022 · 10 comments · Fixed by #7167
Labels
contribution welcome Request makes sense, maintainers probably won't have time, contribution would be welcome enhancement New feature or request

Comments

@svrnm
Copy link
Member

svrnm commented Sep 14, 2022

Follow-up to open-telemetry/opentelemetry-java#3308 & related ticket with js: open-telemetry/opentelemetry-js-contrib#1173

Is your feature request related to a problem? Please describe.

Detection of the container.id in ContainerResource.java currently works by reading from /proc/self/cgroup. That does not work when cgroupv2 is turned on. However, based on these two sources there's another place to read from:

In short, the container.id could be read from /proc/self/mountinfo as well. This also works for containerd and not only docker.

Eventually it would be good to have a standard for that, but unfortunately this is a long-standing non-moving OCI issue, until then a combination of both approaches seems to be reliable enough.

Describe the solution you'd like
Add code that reads the container id from /proc/self/mountinfo if reading from /proc/self/cgroup failed.

Describe alternatives you've considered
As stated in the description, eventually it would be great to have this standardized, but until then this is the best way to go.

cc: @lo-jason , @PeterF778

@svrnm
Copy link
Member Author

svrnm commented Sep 20, 2022

JavaScript PR for reference: open-telemetry/opentelemetry-js-contrib#1181

@jkwatson jkwatson added the contribution welcome Request makes sense, maintainers probably won't have time, contribution would be welcome label Sep 20, 2022
@mateuszrzeszutek
Copy link
Member

mateuszrzeszutek commented Sep 21, 2022

I think this issue belongs in the instrumentation repo now

@mateuszrzeszutek mateuszrzeszutek transferred this issue from open-telemetry/opentelemetry-java Sep 21, 2022
@mateuszrzeszutek mateuszrzeszutek added enhancement New feature or request and removed Feature Request labels Sep 21, 2022
@svrnm
Copy link
Member Author

svrnm commented Oct 5, 2022

@mateuszrzeszutek
Copy link
Member

It is, but that's the deprecated copy -- we've recently decided to move all resource providers to this repo (cause they kind of are instrumentations), and deprecate the old ones in otel-java. Here's the same code from this repo: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/ContainerResource.java

@svrnm
Copy link
Member Author

svrnm commented Oct 5, 2022

Ah, ok, that's good to know!

edysli added a commit to edysli/opentelemetry-java-instrumentation that referenced this issue Oct 15, 2022
Refactor class `ContainerResource` so that its methods return
`Optional`s instead of `null`. This makes for safer code that doesn't
rely on `null` to signal the absence of result. Moreover,
`Optional.map()` and `Optional.orElseGet()` use the same functional
style as `Stream`, which is well readable.

Issue-Id: open-telemetry#6694
Issue-Id: open-telemetry/opentelemetry-java#2337
edysli added a commit to edysli/opentelemetry-java-instrumentation that referenced this issue Oct 15, 2022
Several test cases in `ContainerResourceTest` have the same outcome,
so instead of duplicating the assertions inside each test method, take
advantage of `@ParameterizedTest` to inject inputs and expected
outputs multiple times into the same test. This improves test code
readability by reducing test method length and making identical cases
more obvious.

Also rename test methods to better express the tested code's expected
behaviour.

Issue-Id: open-telemetry#6694
Issue-Id: open-telemetry/opentelemetry-java#2337
edysli added a commit to edysli/opentelemetry-java-instrumentation that referenced this issue Oct 15, 2022
Refactor class `ContainerResource` so that its methods return
`Optional`s instead of `null`. This makes for safer code that doesn't
rely on `null` to signal the absence of result. Moreover,
`Optional.map()` and `Optional.orElseGet()` use the same functional
style as `Stream`, which is well readable.

Issue-Id: open-telemetry#6694
Issue-Id: open-telemetry/opentelemetry-java#2337
edysli added a commit to edysli/opentelemetry-java-instrumentation that referenced this issue Oct 15, 2022
Several test cases in `ContainerResourceTest` have the same outcome,
so instead of duplicating the assertions inside each test method, take
advantage of `@ParameterizedTest` to inject inputs and expected
outputs multiple times into the same test. This improves test code
readability by reducing test method length and making identical cases
more obvious.

Also rename test methods to better express the tested code's expected
behaviour.

Issue-Id: open-telemetry#6694
Issue-Id: open-telemetry/opentelemetry-java#2337
edysli added a commit to edysli/opentelemetry-java-instrumentation that referenced this issue Oct 17, 2022
Several test cases in `ContainerResourceTest` have the same outcome,
so instead of duplicating the assertions inside each test method, take
advantage of `@ParameterizedTest` to inject inputs and expected
outputs multiple times into the same test. This improves test code
readability by reducing test method length and making identical cases
more obvious.

Also rename test methods to better express the tested code's expected
behaviour.

Issue-Id: open-telemetry#6694
Issue-Id: open-telemetry/opentelemetry-java#2337
trask added a commit that referenced this issue Oct 19, 2022
…e` to avoid using null (#6889)

While I was looking at issues
#6694 and
open-telemetry/opentelemetry-java#2337, I saw that the code in
`io.opentelemetry.instrumentation.resources.ContainerResource` used
`null` several times as return value which isn't safe. Nowadays,
`Optional` is better suited to signal the absence of a result, so I
refactored `ContainerResource` to use `Optional`s instead of null.

On the way, I also refactored this class's unit tests into parameterised
tests to reduce test code duplication. These improvements should help
implementing a solution to
#6694.

Co-authored-by: Trask Stalnaker <[email protected]>
dmarkwat pushed a commit to dmarkwat/opentelemetry-java-instrumentation that referenced this issue Oct 22, 2022
…e` to avoid using null (open-telemetry#6889)

While I was looking at issues
open-telemetry#6694 and
open-telemetry/opentelemetry-java#2337, I saw that the code in
`io.opentelemetry.instrumentation.resources.ContainerResource` used
`null` several times as return value which isn't safe. Nowadays,
`Optional` is better suited to signal the absence of a result, so I
refactored `ContainerResource` to use `Optional`s instead of null.

On the way, I also refactored this class's unit tests into parameterised
tests to reduce test code duplication. These improvements should help
implementing a solution to
open-telemetry#6694.

Co-authored-by: Trask Stalnaker <[email protected]>
LironKS pushed a commit to helios/opentelemetry-java-instrumentation that referenced this issue Oct 23, 2022
…e` to avoid using null (open-telemetry#6889)

While I was looking at issues
open-telemetry#6694 and
open-telemetry/opentelemetry-java#2337, I saw that the code in
`io.opentelemetry.instrumentation.resources.ContainerResource` used
`null` several times as return value which isn't safe. Nowadays,
`Optional` is better suited to signal the absence of a result, so I
refactored `ContainerResource` to use `Optional`s instead of null.

On the way, I also refactored this class's unit tests into parameterised
tests to reduce test code duplication. These improvements should help
implementing a solution to
open-telemetry#6694.

Co-authored-by: Trask Stalnaker <[email protected]>
LironKS pushed a commit to helios/opentelemetry-java-instrumentation that referenced this issue Oct 31, 2022
…e` to avoid using null (open-telemetry#6889)

While I was looking at issues
open-telemetry#6694 and
open-telemetry/opentelemetry-java#2337, I saw that the code in
`io.opentelemetry.instrumentation.resources.ContainerResource` used
`null` several times as return value which isn't safe. Nowadays,
`Optional` is better suited to signal the absence of a result, so I
refactored `ContainerResource` to use `Optional`s instead of null.

On the way, I also refactored this class's unit tests into parameterised
tests to reduce test code duplication. These improvements should help
implementing a solution to
open-telemetry#6694.

Co-authored-by: Trask Stalnaker <[email protected]>
trask pushed a commit that referenced this issue Nov 15, 2022
This resolves #6694.

We've been tracking the update to cgroup version support and want to get
ahead of the widespread usage. The surface of the existing
`ContainerResource` has not changed, but its internals have been
factored out to two "extractor" utilities -- one that understands cgroup
v1 and another for v2. v1 is attempted and, if successful, the result is
used. If v1 fails, then the `ContainerResource` will fall back to v2.

As mentioned in #6694, the approach taken in this PR is borrowed from
[this SO
post](https://stackoverflow.com/questions/68816329/how-to-get-docker-container-id-from-within-the-container-with-cgroup-v2)
combined with local experimentation on docker desktop on a Mac, which
already uses cgroup2 v2.
@svrnm
Copy link
Member Author

svrnm commented Nov 15, 2022

thanks @breedx-splk for fixing this :-)

LironKS pushed a commit to helios/opentelemetry-java-instrumentation that referenced this issue Dec 4, 2022
…e` to avoid using null (open-telemetry#6889)

While I was looking at issues
open-telemetry#6694 and
open-telemetry/opentelemetry-java#2337, I saw that the code in
`io.opentelemetry.instrumentation.resources.ContainerResource` used
`null` several times as return value which isn't safe. Nowadays,
`Optional` is better suited to signal the absence of a result, so I
refactored `ContainerResource` to use `Optional`s instead of null.

On the way, I also refactored this class's unit tests into parameterised
tests to reduce test code duplication. These improvements should help
implementing a solution to
open-telemetry#6694.

Co-authored-by: Trask Stalnaker <[email protected]>
@Sprakhar97
Copy link

@svrnm I wanted to highlight that this approach will not work for k8s as the container Id in k8s pod spec is different from the one we are extracting from proc/self/mouninfo and hence any correlation with k8s pod will break.

@breedx-splk
Copy link
Contributor

k8s pod spec

@Sprakhar97 Can you link to the part of the spec that you're referring to? Are you saying that the parsing will still work, but the ID is semantically something different?

Any thoughts on how to fix this? Do you know of a better mechanism for finding the current pod id in a compatible (correlatable) way?

@svrnm
Copy link
Member Author

svrnm commented Jun 6, 2023

I assume this comment from @biswajit-nanda on #8462 is related.

@XSAM reported the same issue a while back for go, this is not unique to java.

Overall, there is a major issue with container id detection, since none of the approaches taken is reliable, see also containerd/containerd#8185

@XSAM
Copy link
Member

XSAM commented Jun 10, 2023

this is not unique to java.

Yes, this could happen to every language's implementation, as the id in the cgroup v2 file may not be the current container's id, it could be the k8s busybox container's id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contribution welcome Request makes sense, maintainers probably won't have time, contribution would be welcome enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants