-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: add otel sdk tracing and metric providers to the core cli #4889
cli: add otel sdk tracing and metric providers to the core cli #4889
Conversation
I've created this as a draft, but the code itself is mostly complete and could be merged as-is. The code in this PR is already part of buildx and there's a version this was based off of in compose for the telemetry. I thought it might be a good idea to start working on integrating these two so buildx and compose could be using the same code and could benefit from the same changes. @milas I'm really interested in feedback from you on this too since you have worked on the compose side of this. |
f5bc6da
to
e809aba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general i like the approach.
I do have a few thoughts though, I'll leave them here so we can discuss (maybe i'm missing some details):
- I think integrating otel things into the cli like this could cause some issues down the road, since we'd be using otel structs and methods directly even in projects that import the cli to use this common code. Maybe it could be worth it to decouple our usage of telemetry things from our actual implementation using otel, also for keeping things more easily standardized across projects;
- Given this ^, it might be a good idea to separate these bits into a dedicated repo/pkg that only exports the necessary functionality and abstracts otel resources away from the projects using it. Since the CLI is already doing double duty as both an application and a "library" for plugins, I'd try to avoid adding more responsibilities here. Separating things out, we can also collaborate on improvements to the required functionality more easily. Do you see any issues with this?;
- Shall we also include some functionality for adding Attributes, span Events, etc so clients can do so without depending directly on otel pkgs?
@krissetto I agree with some of what you're saying on principle, but I do worry a bit about overabstracting in the case of the OTEL libraries.
I think there's two components of otel that need to be treated separately. There's the library portion, which is mostly just interfaces, and the SDK, which is the implementation of the SDK. For most code, they should use the library portion and we shouldn't really worry about the actual implementation. The purpose of these interfaces is to create this separation and adding our own separation would probably not work very well. For the SDK, the only part that's really supposed to use the SDK is the outermost section of the program. The part that configures the SDK and passes the In particular, there is one area of this PR that's a leaky abstraction when it comes to the SDK. // TelemetryClient provides the methods for using OTEL tracing or metrics.
type TelemetryClient interface {
// Resource returns the OTEL Resource configured with this TelemetryClient.
// This resource may be created lazily, but the resource should be the same
// each time this function is invoked.
Resource() *resource.Resource
// TracerProvider returns a TracerProvider. This TracerProvider will be configured
// with the default tracing components for a CLI program along with any options given
// for the SDK.
TracerProvider(opts ...sdktrace.TracerProviderOption) TracerProvider
// MeterProvider returns a MeterProvider. This MeterProvider will be configured
// with the default metric components for a CLI program along with any options given
// for the SDK.
MeterProvider(opts ...sdkmetric.Option) MeterProvider
} The options to While I'm not thrilled with this, the alternative is wrapping each option in a separate library. But, these options get fairly involved and you suddenly end up creating an interface for most of the structs in the SDK. At the end of the day, no application or plugin is required to use the elements provided by the CLI. It's just convenient and takes away the hassle of figuring out how to do specific CLI-specific things with it. And I can't think of a situation where a CLI application that is using tracing or metrics wouldn't just use the official OTEL SDK or provide their own custom implementation of these methods.
Mentioned above, but my primary concern is that the abstraction won't yield better code. The code in this PR is mostly low/no-cost if the functions aren't invoked. If you're integrating tracing or metrics into your docker cli plugin, they're pretty much exactly what you would want with options to replace certain aspects (such as the default resource).
Attributes, span events, and other things are included as part of the OTEL library and not part of the SDK so I don't think there's much value in abstracting these away. The otel libraries themselves should be safe to use directly and are considered stable for tracing and metrics. OTEL logs are mostly an internal abstraction inside of the collector and can mostly be ignored from an application point of view. |
e809aba
to
0f9e229
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #4889 +/- ##
==========================================
- Coverage 61.43% 61.01% -0.43%
==========================================
Files 289 291 +2
Lines 20237 20374 +137
==========================================
- Hits 12433 12431 -2
- Misses 6903 7041 +138
- Partials 901 902 +1 |
Yeah, I agree with your reasoning here. I got a bit confused between sdk and lib things being new to otel things.
Totally agree with not wrapping most of the SDK structs, I was initially thinking about only a core set of basic functionality that we expect most apps/plugins to use. Given the comments above, i see this makes less sense than I initially thought.
This is true, even if I'd personally push for standardizing our approaches to use these common bits as much as possible.
My main concern here was with our dependencies, and keeping the various projects all on the same version of the sdk/libs. I generally always try to avoid having many projs on potentially different versions of the same libs (even though SemVer should theoretically give us some guarantees, i've had too many bad experiences 😂). No hard feelings though. Let's see what others think, and let's keep this PR moving! |
My personal hope is that, with this in the CLI, it'll mostly standardize the version the various CLI programs use. I do think a follow up thing after that is maybe creating a separate repository for OTEL stuff in general so server and cli components are synced on the versions. |
4fceb95
to
f7e1640
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks really good to me, thank you @jsternberg! I'd like to get these basic bits merged in quickly so we can start making use of them and moving forward with a proper OTEL implementation on the CLI side.
Just a couple of questions –
f7e1640
to
a41fbee
Compare
a41fbee
to
b484bc0
Compare
vendor/github.com/grpc-ecosystem/grpc-gateway/v2/runtime/marshal_jsonpb.go
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me at least this looks good! :)
@neersighted @thaJeztah can you TAL when you get a sec? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some (mostly Go-level/without a strong OTel background) thoughts; but overall the direction looks good.
cli/command/telemetry_docker.go
Outdated
|
||
// dockerExporterOTLPEndpoint retrieves the OTLP endpoint used for the docker reporter | ||
// from the current context. | ||
func dockerExporterOTLPEndpoint(cli Cli) (endpoint string, secure bool, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate on the above, this is more-or-less the code I want to see moved into the context itself (making "OTEL" a first class property/component of the context).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand a bit on this? I'm not really sure I understand what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see this eventually become part of the Context struct/have the lifecycle handled there; this is really just a note for a future refactor.
This adds the code used by buildx and compose into the default CLI program to help normalize the usage of these APIs and allow code reuse between projects. It also allows these projects to benefit from improvements or changes that may be made by another team. At the moment, these APIs are a pretty thin layer on the OTEL SDK. It configures an additional exporter to a docker endpoint that's used for usage collection and is only active if the option is configured in docker desktop. This also upgrades the OTEL version to v1.19 which is the one being used by buildkit, buildx, compose, etc. Signed-off-by: Jonathan A. Sternberg <[email protected]>
b484bc0
to
89db01e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🥳
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [docker/docker](https://togithub.com/docker/docker) | minor | `26.0.2` -> `26.1.0` | --- ### Release Notes <details> <summary>docker/docker (docker/docker)</summary> ### [`v26.1.0`](https://togithub.com/moby/moby/releases/tag/v26.1.0) [Compare Source](https://togithub.com/docker/docker/compare/v26.0.2...v26.1.0) #### 26.1.0 For a full list of pull requests and changes in this release, refer to the relevant GitHub milestones: - [docker/cli, 26.1.0 milestone](https://togithub.com/docker/cli/issues?q=is%3Aclosed+milestone%3A26.1.0) - [moby/moby, 26.1.0 milestone](https://togithub.com/moby/moby/issues?q=is%3Aclosed+milestone%3A26.1.0) - Deprecated and removed features, see [Deprecated Features](https://togithub.com/docker/cli/blob/v26.1.0/docs/deprecated.md). - Changes to the Engine API, see [API version history](https://togithub.com/moby/moby/blob/v26.1.0/docs/api/version-history.md). ##### New - Add configurable OpenTelemetry utilities and basic instrumentation to commands. For more information, see [OpenTelemetry for the Docker CLI](https://docs.docker.com/config/otel). [docker/cli#4889](https://togithub.com/docker/cli/pull/4889) ##### Bug fixes and enhancements - Native Windows containers are configured with an internal DNS server for container name resolution, and external DNS servers for other lookups. Not all resolvers, including `nslookup`, fall back to the external resolvers when they get a `SERVFAIL` answer from the internal server. So, the internal DNS server can now be configured to forward requests to the external resolvers, by setting `"features": {"windows-dns-proxy": true }` in the `daemon.json` file. [moby/moby#47584](https://togithub.com/moby/moby/pull/47584) > \[!NOTE] > This will be the new default behavior in Docker Engine 27.0. > \[!WARNING] > The `windows-dns-proxy` feature flag will be removed in a future release. - Swarm: Fix `Subpath` not being passed to the container config. [moby/moby#47711](https://togithub.com/moby/moby/pull/47711) - Classic builder: Fix cache miss on `WORKDIR <directory>/` build step (directory with a trailing slash). [moby/moby#47723](https://togithub.com/moby/moby/pull/47723) - containerd image store: Fix `docker images` failing when any image in the store has unexpected target. [moby/moby#47738](https://togithub.com/moby/moby/pull/47738) </details> --- ### Configuration 📅 **Schedule**: Branch creation - "after 6am on monday" (UTC), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/earthly/dind). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zMjEuMiIsInVwZGF0ZWRJblZlciI6IjM3LjMyMS4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZSJdfQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: idodod <[email protected]>
- What I did
This adds the code used by buildx and compose into the default CLI program
to help normalize the usage of these APIs and allow code reuse between
projects. It also allows these projects to benefit from improvements or
changes that may be made by another team.
At the moment, these APIs are a pretty thin layer on the OTEL SDK. It
configures an additional exporter to a docker endpoint that's used for
usage collection and is only active if the option is configured in docker
desktop.
This also upgrades the OTEL version to v1.19 which is the one being used by
buildkit, buildx, compose, etc.
- How I did it
I added a new interface to the DockerCli and implemented that interface using
code that is mostly identical to the same code in buildx which was adapted from
compose. It's missing the user-facing OTEL environment components that buildx
has though.
- How to verify it
Not done yet.
- Description for the changelog
Added configurable OTEL utilities and basic instrumentation to commands.
- A picture of a cute animal (not mandatory but encouraged)