Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Terminology] Provide a term for a language-specific solution that adds otel to an application without the need of changing the code of the application itself. #4129

Open
svrnm opened this issue Jul 4, 2024 · 6 comments
Labels
triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor

Comments

@svrnm
Copy link
Member

svrnm commented Jul 4, 2024

As a follow-up to open-telemetry/community#2165 I want to give it another try to discuss and clarify the terminology around a language-specific solution that adds otel to an application without the need of changing the code of the application itself.

Goal

For the OpenTelemetry documentation we need a term that describes any solution that an end user can add to their application without changing application code to make that application emit telemetry otel-style, by adding the SDK, auxiliary pieces (exporters, sampler, resource detectors, language-specific config helpers, etc.) and instrumentation libraries to the application. This term will be the umbrella term in navigation and title for this page, so it includes existing solutions for .NET, Python, Go, Java, PHP and JavaScript and any future solutions that will be added. Some of those solutions add otel at runtime (like java agent, or ebpf go solution) but also at compile time (like the Spring Boot Starter or go instrgen).

This term should be easy to understand to an end-user and not be used for anything else. So the final definition should also contain examples & counter-examples how to use that term, for example this term should not be used to describe instrumentation libraries or the process of the opentelemetry operator to inject opentelemetry into an application, so it might be required to define some of those related terms as well.

Current Situation

open-telemetry/community#2165 and others contain a lot of history to that, but I try to provide a summary: We have four terms used across the ecosystem: "Automatic Instrumentation", "Instrumentation", "Zero-Code Instrumentation", "(Instrumentation) Agent" and "Distro". There might be more but those are the ones I am aware of, all of them have their downsides and I try to provide a summary for each one of them below.

Automatic Instrumentation

This - in it's capitalized form - has been so far the commonly used term to describe what we are looking for. So on paper there is a difference between "Automatic Instrumentation" and "automatically instrumenting" something. We have plenty of examples where we use the lower case version when not referring to a language-specific solution to add otel to an application, even in the spec, e.g.:

We can provide auto-instrumentation for most popular logging libraries. The

A typical template for an auto-instrumentation implementing this semantic convention

There are many more examples, but the two most problematic are when one talks about an instrumentation library or when one uses to describe a mechanism to accomplish automatic instrumentation:

  • This instrumentation library automatically instruments library
  • This instrumentation library provides automatic instrumentation for library
  • This code is automatically instrumented via monkey patching or bytecode instrumentation

Related to the last one, there is an official definition for automatic instrumentation in the spec already, but it talks about telemetry collection methods:

Refers to telemetry collection methods that do not require the end-user to modify application's source code. Methods vary by programming language, and examples include code manipulation (during compilation or at runtime), monkey patching, or running eBPF programs.

For me, this means that "code manipulation", "monkey patching" or "running ebpf programs" are "telemetry collection methods" and by that are what should be called "automatic instrumentation". This is not saying that "Automatic Instrumentation refers to language-specific solutions that add telemetry collection to an application without requiring the end user to modify source code."

Instrumentation

We host multiple language-specific solution to add otel to an application in repositories that we call opentelemetry-<language>-instrumentation which is highly confusing for end-users exploring our repositories, since they might expect to find instrumentation libraries in those repos (which they do for java, but most of the case they are hosted in "contrib").

Zero-Code Instrumentation

Following this discussion and this issue we decided to go with "Zero-Code Instrumentation" for the documentation to solve the problem stated above (the need for an umbrella term). The main driver behind this was that "automatic instrumentation" has the problems outlined above.

Interestingly (and unfortunately) I saw a few examples recently were it was used similarly to automatic instrumentation to describe something different, e.g. "This instrumentation library allows you to add opentelemetry to your library zero-code-instrumentation-style".

(Instrumentation) Agent

There is a long history of people objecting to use "Agent", I can dig up some history to that, if required. But a major blocker for it is that not all solutions are "agents" in the sense of an "APM Agent", e.g. instrgen or the Spring Boot Starter are no agents.

Distro

Python uses the term "distro" to describe their solution to instrument applications without code changes. Distro/Distribution is indeed a term that also will need to be defined more clearly, but this is out of scope.

Next Steps

This discussion is of high risk to turn into bike shedding without any proper outcome:

Important

**We (the people writing documentation) and others (like the people doing presentations, blog posts, trainings or certifications) require that term and the longer we wait to not fix it we make this problem worth.

So, I kindly ask for the following:

  • Review this issue, also review the existing solutions and how they might be different
  • Help to delineate the terms we want to mean different things before engaging in deep discussions if X is the better word than Y
  • Let's have a discussion on those terms but let's make sure that we eventually find a solution even if some people are not happy with it

So, without fixing a term, here is what I propose:

  1. We provide <term a> which is an umbrella term, that describes a language-specific solution to add all component needed for emitting telemetry to an application without the need of changing the code of the application itself. The components needed are at least the SDK, but auxiliary pieces (exporters, sampler, resource detectors, language-specific config helpers, etc.) and instrumentation libraries can be included as well.
  2. Such a solution <term a> may ask the end user to write configuration (like the spring boot starter) or other "code-like" additions, but the key piece is that the application code itself remains untouched.
  3. <term a> can not be used in other context, like instrumentation libraries, or to describe a mechanism that is used to accomplish the goal (like ebpf, byte code injection, etc.), or the process of injecting such a solution through a k8s operator (or other tools)
  4. There are ways to distinguish certain kinds of <term a>, e.g. "compile time" (spring boot starter, instrgen, code injection in general) and "runtime" (java agent, ebpf)
  5. If we fix <term a> with something that is used in the ecosystem already, we make sure that proper replacements can be provided.
  6. Another <term b> is required to describe telemetry collection methods that are used to build instrumentation libraries. This term should be clearly different from <term a>
@theletterf
Copy link
Member

theletterf commented Jul 5, 2024

Thanks for the excellent overview, @svrnm !

I'd like to understand first what's the scenario in the OTel roadmap for all future instrumentation: Are all instrumentations going toward automatic or zero-code? Is that even possible for all?

If we're going toward zero-code as default

This scenario requires that all OpenTelemetry instrumentation is an automatic/autonomous/zero-code experience by default. If such is the direction OTel heading to, no special term would be required and we should treat the opposite case, that is, instrumentation that requires writing custom code as the exception, and come up with a term for it. I know we've recently gone away from that in the docs, so I guess that's not the future scenario?

If zero-code will always be a plus limited to runtimes

In that case, I'd rather go with automatic instrumentation, for two main reasons: 1) as a term, it's prevalent among vendors, and 2) it's quite clear semantically, even though the actual mechanism can differ. I see two issues with zero-code: 1) it sounds a bit like a marketing term, and 2) it's not always accurate (for example, one might need to at least edit a configuration file or a require statement somewhere, as is the case for PHP).

Just my two cents.

@svrnm
Copy link
Member Author

svrnm commented Jul 8, 2024

@open-telemetry/technical-committee can you help and steer this discussion please. The goal is to have the terms defined in the glossary eventually such that we can use them in Docs and other community writings

@svrnm svrnm added the triage:deciding:tc-inbox Needs attention from the TC in order to move forward label Jul 8, 2024
@jsuereth
Copy link
Contributor

jsuereth commented Aug 7, 2024

TC Triage: This is important to decide and needs to be driven into the Specification. We recommend bringing this issue to the Specification meeting to have a broad discussion and see if we can make progress before escalating to a private discussion.

This will need a sponsor, is that @svrnm ?

@jsuereth jsuereth added the triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor label Aug 7, 2024
@jsuereth jsuereth removed the triage:deciding:tc-inbox Needs attention from the TC in order to move forward label Aug 7, 2024
@svrnm
Copy link
Member Author

svrnm commented Aug 9, 2024

TC Triage: This is important to decide and needs to be driven into the Specification. We recommend bringing this issue to the Specification meeting to have a broad discussion and see if we can make progress before escalating to a private discussion.

Thanks. I will try to attend a Spec Meeting, but they unfortunately collide with an internal meeting I have a hard time to skip.

This will need a sponsor, is that @svrnm ?

Not sure if a GC-member is in the list of potential sponsors? I am OK with driving this by providing a proposal in a PR, but I guess a spec sponsor needs to sponsor that?

@austinlparker
Copy link
Member

A discussion point/question:

How much do we, as a project/community, want to drive 'instrumentation agents' as a desirable thing? I think there are plenty of examples where 'config-driven instrumentation' would be useful even in a world where instrumentation is natively available in a runtime or framework or library, but I think that we can't really make a decision about the name of this without also addressing our overall positioning on using instrumentation agents in the first place.

@svrnm
Copy link
Member Author

svrnm commented Aug 13, 2024

How much do we, as a project/community, want to drive 'instrumentation agents' as a desirable thing?

Eventually they are the desirable solution for legacy software, where you don't want to or can't do code-based instrumentation. While the number of applications is shrinking, there will be "a lot" remaining in the future.

Another use case we have as of today, is the "getting started": we recommend automatic instrumentation to people to have a quick way to have their application drop telemetry. But, if "manually" instrumenting my application would be easy due to all my dependencies being natively instrumented AND setting up the SDK being almost effortless, this would also be not the case anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage:accepted:needs-sponsor Ready to be implemented, but does not yet have a specification sponsor
Projects
Status: Spec - Accepted
Development

No branches or pull requests

4 participants