Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can we encourage consistent terminology across the OpenTelemetry project? #2165

Closed
svrnm opened this issue Jun 19, 2024 · 12 comments
Closed

Comments

@svrnm
Copy link
Member

svrnm commented Jun 19, 2024

This is an issue as old as the project and I get back to it again and again, to capture it at a central place I am creating this issue:

While writing documentation for different sub-projects of the OpenTelemetry project I ran into inconsistencies in terminology again and again. Sometimes because 2 SIGs created the same thing independently outside the specification and gave it different names, sometimes names are used slightly different because definitions are not precise and sometimes terminology is used that has a different definition in the spec. A few examples:

  • Automatic Instrumentation: We talked about this one in many issues, up to the point that we started to avoid using it in the documentation. In the spec it is defined as telemetry collection methods that do not require the end-user to modify application’s source code, and it is used as a synonym for a bundle of instrumentation libraries, "zero-code solutions"1 (.NET, Go), in a sentence "the process of executing an automatically instrumented logs", or as a synom for a single instrumentation library
  • Instrumentation: This term is also used to describe a lot of the things mentioned above, e.g. Instrumentation Libraries are often named opentelemetry-instrumentation-<framework>, it is used to name automatic instrumentation (see the repository names for opentelemetry-java-instrumentation, opentelemetry-go-instrumentation, opentelemetry-dotnet-instrumentation), to name tools to auto instrument (python has a CLI tool called opentelemetry-instrument. In docs we now use it as top level title for what previously was called "manual instrumentation", e.g. "Language APIs & SDKs > PHP > Instrumentation"
  • Agent: We do an OK job to avoid this term these days where it's not accurate, so what remains is the "Java agent", the "agent mode" for the collector, python uses the term "agent" for automatic instrumentation. OpAMP has "agent" in it of course and we hopefully avoid "agent" for the recently donated profiler.
  • Distribution: There is an ongoing discussion in the collector SIG around "collector distributions", we have a concept page what a distro is and we have the python distro

There are probably more examples of this, but the purpose of this issue is not to discuss those terms once again. We tried this before, but rarely found consensus,

So this issue is about

  • highlighting this problem
  • discuss it's impact
  • a process how we can come up with terminology that is used consistently across the OpenTelemetry project
  • a process how we can encourage that terminology is used consistently across the OpenTelemetry project

Since the text above already highlighted the problem, let me dive into the other points

Impact

I think that the biggest issue is that we confuse end users and contributors alike. It gets really hard to talk with people about certain OpenTelemetry concepts without misunderstandings due to those words meaning different things to different people.

Beyond that we have already created a lot of (almost) irreversible instances, as some of the examples above highlight.

Finally, we have a documentation issue (and this is mostly where I am coming from): We (SIG Comms) have a hard time to document certain elements of the project because the terminology is unclear, and if we ask for feedback on terminology our discussions mostly remains between us (open-telemetry/opentelemetry.io#3808, open-telemetry/opentelemetry.io#3809), so either we make a judgement call on our own OR we are stuck with moving certain changes forward.

Process for consistent terminology

This process implicitly exists already: the spec provides a glossary that lists terms and their meaning. Some of the terms above are listed there already, some are not, some have a definition that is not clear enough and then there is also a separate docs glossary.

What is missing is making that process explicit. There should be key requirements, likely

  • when such a term needs to be added,
  • what the definition of that term should contain (a sentence to describe it, references where it is used, example sentences how to use it, examples where it is not applicable)

and most importantly how such a term is finalized without the community getting lost in bikeshedding, e.g. a term is proposed, there is a fixed time window to discuss it, then there is voting and then the "candidate" with the most votes is elected, maybe TC has a "veto".

If we could get to this point, this would be a huge help for writing better documentation, since we (SIG Comms) can come to the spec, ask for a term and can expect it to be available in a certain time.

Process for encouriging consistent terminology

This is where things get much more complicated. "Enforcement" is probably too strong and as mentioned above, there may be some irreversible decisions where we have to live with ambiguity, especially when packages, repos, etc. have been named already.

However, there are a few things we can do nevertheless:

  • For docs we will make sure that the right term is used and in the case of historical ambiguity we can call that out
  • Maintainers of Implementation SIGs know where to look (glossary) and can use that for naming things.

Final Words

I know that this is not an easy issue to resolve, but I just wanted to start by capturing it and maybe we can make incremental progress.


Footnotes

  1. A "zero code solution" is what I try to refer to when I speak about the Java agent or the .NET Automatic Instrumentation provide: a solution that can not only automatically instrument an application, but also handles configuration, exporting, etc.

@ocelotl
Copy link

ocelotl commented Jun 20, 2024

Yes, please. So many names for the same thing, it makes it super hard, specially for people who are trying to understand or learn about this project.

@svrnm
Copy link
Member Author

svrnm commented Jun 20, 2024

Based on an ask by @trask during the GC meeting here is a list of things and sentences we (@open-telemetry/docs-approvers) try to describe. I will focus on "Automatic Instrumentation" in this first run down (and it's probably making up 80% to 90% of the problem):

  1. Any solution that packages the SDK + instrumentation libraries + exporter + configuration + ... and provides the end-user with a way to instrument their application with no or minimal code changes. These include the Java Agent, the Java Spring Boot Starter, the .NET Auto-Instrumentation, the Go Auto Instrumentation, Go InstrGen, the Python Distribution, the PHP OTel Module, Lambda Auto-Instrumentation and others that will come in the future.
  2. A way to consistently express how the OpenTelemetry Operator (or any other wrapper around the wrapper) applies those mentioned in (as in 1.) to applications running inside a cluster, instead of "the operator auto-instruments the java application" (the term "injection" is available as alternative)
  3. A way to express the following sentence without using "automatically instrument" because it implies that instrumentation libraries are automatic instrumentation:
    1. To automatically instrument the libraries/dependencies of your application you can find a list of available instrumentation libraries here.
    2. By installing instrumentation libraries your dependencies will be automatically instrumented
  4. Untangle the following sentences, where I want to distinguish the solution (as in 1.) from the used mechanism:
    1. The Automatic Instrumentation for Java uses byte code instrumentation to automatically instrument your code
    3. The Automatic Instrumentation for JavaScript uses monkey-patching to automatically instrument your code
  5. Related to that a few sentences where "automatic instrumentation" is not referring to the solution (as in 1.) but to the mechanism:
    1. We can provide auto-instrumentation for most popular logging libraries. The auto-instrumented logging statements will do the following
    2. automatic instrumentation mechanisms without code change will often not be able to instrument the processing of the individual messages
  6. Why are all the Automatic Instrumentation/Zero Code Solution repositories called opentelemetry-<language>-instrumentation?
  7. How is a bundle of instrumentation libraries called? Node.js has https://www.npmjs.com/package/@opentelemetry/auto-instrumentations-node

There are probably more examples, but those are the ones available to my late night brain right now.

Note: There are also a lot of sentences that contain "automatic" in a different meaning, which are fine but can be confusing, examples:

  • OpenTelemetry Meta Packages for Node automatically loads instrumentation [libraries] for Node builtin modules and common packages.
  • There is automatic configuration for manual instrumentation

@svrnm svrnm changed the title How can we enforce consistent terminology across the OpenTelemetry project? How can we encourage consistent terminology across the OpenTelemetry project? Jun 21, 2024
@svrnm
Copy link
Member Author

svrnm commented Jun 21, 2024

Following up on this based on my comment for the OTel Collector Distro discussion and after I had some time to think about the input provided by @open-telemetry/governance-committee during a meeting yesterday:

This issue is about the need of having a process to define key terms in as clear and as precise ways as possible. Those key terms have validity within the OpenTelemetry project and because of that they should be easily available to every contributor and known to (and encouraged/enforced by) maintainers in their subprojects.

Here is a proposal:

  • For every key term that is relevant for the specification the Spec Glossary is authoritative. Because of that the @open-telemetry/technical-committee owns it.
  • For every key term that is relevant outside the Spec, but within the project (e.g. definition of "Collector" or "Distribution") we create a Community Glossary in the community repo, which is owned by the @open-telemetry/governance-committee.
  • A key term should only live in one of these 2 glossaries, but if necessary it can move (e.g. we might define a term outside of the spec first but later find it necessary to be moved)
  • A key term is binding across the OpenTelemetry project, @open-telemetry/docs-approvers will make sure that it is used consistently on the opentelemetry.io website and SIG maintainers are responsible to know about them and use them accordingly as well.
  • Process
    • A key term should have a definition and provide examples how to use the term in a sentence, examples how to not use it and a list of alternative terms that should be avoided (e.g. "Instrumentation Library" and not "Instrumentation Package")
    • To create a new key term an OpenTelemetry member raises a PR against the Spec or the Community repo. They provide a description of the term they are looking for, they provide at least one name suggestion, one sentence how to use it. They also need to provide a rationale why it is necessary to fix that term as a key term (e.g. multiple terms describing the same thing are floating around which leads to confusion)
    • SIG Comms maintainers and approvers are notified about this PR and asked to provide their opinion and insights on that certain term and definition.
    • The pull request will be merged when the following condition is met: all committee members have approved the pull request OR a majority of committee members have approved the pull request and no member has objected by requesting changes on the pull request
    • By rejecting the pull request or not merging it within 3 months, the owning committee signals that they do not want to have this term fixed and the initial author (and everybode else) can use whatever term they deem to be correct.

Note that I specifically say key term: the goal of this is not to define the heck out of every word that we use, but to make sure that certain (few) words mean the same thing no matter which piece of the OpenTelemetry project an user is looking at.

@codefromthecrypt
Copy link
Contributor

I would love to see this capture citations of concrete instances, as that allows folks like me to help contribute to this discussion without the tacit knowledge about this accumulated over the years

  • end users - slack links, quotes, from end users who conflate terms or misuse them
  • products - 3rd party misunderstanding (e.g. calling something an X when it is a Y)
  • legal - any occurrence where we've run into a real trademark or otherwise problem, even if it isn't fully disclosable

Ack this could make the comment count high, but I believe it is a great way to make things transparent especially those in inconvenient timezones.

@svrnm
Copy link
Member Author

svrnm commented Jun 24, 2024

@codefromthecrypt that's a valid request and @trask asked for something similar that's why #2165 (comment) exists.

A very recent example is this PR for our documentation:

open-telemetry/opentelemetry.io#4727

I am less worried about "instrumentation package" and "instrumentation module" since they are easily to understand as a synonym for "instrumentation library", however there are many cases of "auto(matic) instrumentation library/package/module", which gets even more common when we move outside docs but stay within the project:

Note, that I am not saying that those sentences are wrong and all those instances should be replaced with something different. If we look at the current definition of automatic instrumentation many of them are valid: the definition talks about methods and lists monkey patching and code manipulation as examples, and many of the instrumentation libraries above use those methods to instrument libraries. However we push(ed) for "Automatic Instrumentation" to be equivalent with language-level solutions to instrument an application without touching (or barely touching) the code.

A few quotes from the ecosystem:

Again, based on the current definition there is nothing wrong with that. But it makes writing documentation and especially structuring documentation really hard. That's why we replaced "Automatic" with "Zero-Code Instrumentation" in our Information Architecture, because we needed to distinguish between "automatically instrumenting libraries" and "Automatic Instrumentation for Applications"

@codefromthecrypt
Copy link
Contributor

Thanks for the notes and feel free to tell me to fork this to a different issue.

I've heard customer asks in the past (sorry no quotes at the moment, but I'm sure we can find some) to "enable tracing", and sometimes they are surprised if "zero code" means recompiling and packaging, or re-linking. That or they have no idea which spring boot modules to depend on in order to get the "zero code" result, as that first requires them to know the inventory of what is in the app.

So, I suggest we strengthen the terminology about agent (no code or packaging change) vs no code/quasi automatic (possibly a significant amount of work including understanding what's in the app and choosing dependencies)

If driven by requesting user, it is more about black box vs grey box. If they only know about the app in terms of a black box, somethings called "automatic" will not help them, and possibly cause more confusion as automatic implies you don't need to do (and possibly know) anything

my 2p

@svrnm
Copy link
Member Author

svrnm commented Jun 25, 2024

thanks @codefromthecrypt, you bring up some good points and also helped me to re-think a few points. You are absolutely right that this is about the end-user and not confusing them, that's why I think we need some process to find good terminology as a community, because with whatever term we come up, if it is then not used or used differently by different parts of the project we can throw them out once again and begin again.

To get back to @trask's and your request that you'd like to understand what I want to describe instead of talking about the words, let me try a different perspective with the risk of yet another lengthy comment:

If we focus on the language-specific implementation SIGs we have the following "order" of things:

  • Layer 0: The spec creates the language-agnostic requirements for API, SDK and "auxiliary pieces"1, you can find it grouped in the spec repo and on the website
  • Layer 1: The language SIGs create language-specific implementations of the API, SDK and those "auxiliary pieces" in their core repositories and we have this documented under the umbrella Language APIs & SDKs. From an end-user perspective this is all "manual/code-based/direct...", so they say "I used the OpenTelemetry SDK to manually instrument my code"
  • Layer 2: Some of the language SIGs and 3rd parties create instrumentation libraries for frameworks available in their language ecosystem. We have this documented under Language APIs & SDKs > $lang > Libraries. From an end-user perspective this is already "automatic/(zero-code or low-code)", so they say "I used instrumentation libraries to automatically instrument the dependencies of my application", although they manually added those libraries to their code.
  • Layer 3: Some of the language SIGs (and 3rd parties) create (insert name here) to bundle instrumentation libraries and (!) auxiliary pieces1 to provide a solution that allows to add OpenTelemetry to an application "from the outside", which goes beyond instrumentation. We have (currently) documented those under the umbrella Zero-code instrumentation, but also commonly refer to them as "Automatic Instrumentation" in title case. From an end-user these (insert name here) are all "automatic/zero-code" and they similarly can say "I used the java agent to automatically instrument my application" or "I used the Spring Boot Starter to automatically instrument my application"
    • Layer 3a: Some of those (insert name here) do not require me to re-compile/re-build my application because they use mechanisms like byte-code injection, monkey-patching or ebpf to accomplish adding otel to that code.
    • Layer 3b: Some of those (insert name here) depend on the application to be re-compiled or re-build because they use a mechanism like adding additional dependencies (spring boot starter), code weaving (go instrgen) and others.
  • Layer 4: There is tooling (especially th K8s operator) that can take (insert name here) from different languages and inject it into applications automatically(sic!), this is called Injecting (insert name here)

To emphasize this, we need a good name for layer 3, where I wrote (insert name here), so that when someones speaks about that it is clear what they mean, e.g. in a presentation, but especially in our documentation. That term needs to fulfill the following criteria:

  1. The term says "this is a bundle of SDK, auxiliary pieces and instrumentation libraries you can add to your application to make it emit telemetry otel-style"
  2. The term is an umbrella for Layer 3a & Layer 3b, so it includes Java agent, Spring Boot Starter, Go ebpf-based auto instrumentation, go instrgen and others at the same time.
  3. The term can not be used in any other of the Layers described above (see examples below), otherwise it will create confusion again
  4. (bonus) The term is not entirely "made up"
  5. (bonus) Adding a descriptor (adjective?) to the term easily allows to distinguish 3a and 3b
  6. (bonus) It verbs.

Here is what we have today:

  • Automatic Instrumentation
    1. 🚫 -- this only says "instrumentation" nothing about all the other things it does
    2. ✅ -- As per the spec definition this includes runtime and compile time "automation"
    3. 🚫 -- See previous comments. An instrumentation library is also automatically instrumenting something, it is even using the mechanisms from the spec definition
    4. ✅ -- Not made up
    5. ✅ -- We could say "runtime automatic instrumentation" and "compile-time automatic instrumentation"
    6. ✅ -- "The java application was automatically instrumented with the Java agent" or "To automatically instrument your java application at runtime ..."
  • Zero-Code Instrumentation/Solution
    1. 🆗 -- if we stick with "Zero-Code Solution" it kind of says that this does more than instrumentation, but often it is called "Zero-Code Instrumentation"
    2. 🆗 -- it is used synonymous with "Automatic Instrumentation", although for some of the solutions it's more low-code than zero-code (we can argue about writing config files for the spring boot starter counts as "code")
    3. 🚫 -- I just yesterday read someone say "zero-code style approach for instrumentation", which is exactly the same problem that we have with "Automatic Instrumentation"
    4. 🆗 -- it's a little bit made up ;-)
    5. 🆗 -- You can say "Runtime zero-code" and "Compile-time zero-code" (which sounds made up...)
    6. 🆗 -- You can say "I zero-code instrumented my application" (which also sounds made up...)
  • Agent
    1. ✅ -- This is a common term from the APM world that says "this is a bundle of components I use on your behalf to make your application create and emit telemetry, oh, it also samples"
    2. 🚫 -- For me "agent" implies the runtime, also calling the "Spring Boot Starter" an agent would be a terrible idea, we just move the problem from one place to another
    3. 🆗 -- An agent can not be used to describe an instrumentation library or any other layer. Maybe one could say that the Operator is an agent to inject otel, but I never heard someone saying that. There is also the "agent-mode" for the collector
    4. ✅ -- Not made up, a word that means something to a lot of people
    5. 🚫 -- See 2, that doesn't work, a "compile time agent"??
    6. 🚫 -- Doesn't work

So none of these checks criteria 1.-3. (and does good with 4-6), so any ideas what to use?

Footnotes

  1. This includes exporter, resource detectors, sampler, configuration, ..., this excludes "instrumentation libraries" 2

@codefromthecrypt
Copy link
Contributor

Thanks for the evaluation of the layers. In tracing, I used to think and discuss only the following:

tracers - layer 0
instrumentation - use this directly and you have library lock-in - layer 1
auto-configuration (spring boot) - use this and you can get library conflicts or dep problems. - layer 2 a
agents - completely no lock in, user has no code or config dependencies - layer 2b
proxy (sidecar) - completely no lock in but you are limited to instrumentation of your ins and outs - layer not discussed here

ps I think tools or operators that install agents are a subtopic of agents and not really fit in these categories as peers with them.

So, basically I discussed auto-configuration as an alternative to, not a layer below agent. Possibly there is nuance where they are stacked, but I think people more often are thinking the right direction first on this, as APM agents have been around for a very very long time.

So ramble aside, I think we very much should keep terminology for instrumentation and agents, and focus on trying not to use the word agent for pipeline stuff like collectors.

This leaves things that are called "zero code" here, but I think it is misrepresentative and feels too buzzy as well. Basically autoconfiguration/bootstrap tactics which have a side effect on your binary. Spring boot as an example, is not zero code most of the time, as often people add annotations or other things at the bootstrap side. Certainly you have dependency issues that can affect your app functionality. Plus configuration is sometimes needed. Other languages have tactics that require changing how an application starts or its library dependencies. Compile or runtime solutions already exist in auto-configuration (e.g. spring boot and graalvm, spring vs dagger etc), so I would try to focus less on if something is compile or runtime and explain nuance after someone is at a more advanced level to the point they care. So, subtyping is really more the advanced explanation not the start of how we use the bucketed term. the bucketed terms shoud be few and mostly correct.

So, I think we should drop the "zero code" and stick with "automatic instrumentation", even if warts remain. Basically that automatic typically requires a change to how your application starts and may require adding new depednencies or configuration. Application service code requires no code changes, but automation instrumentation frameworks typically configure tracers etc such that you can opt-into using them.

I think this indeed leaves some "other" section, and that's possibly a more honest way to put it than try to make a term that matches everything. Possibly there are some link time things that aren't quite agents or automatic things that are post-compilation toolchain which aren't quite automatic instrumentation, but have a similar affect. I think most new people won't run into these first, so the main thing is to categorize the things that fit and focus on those, while having an advanced, appendix, FAQ for the things harder to pin down.

My 2p

@svrnm
Copy link
Member Author

svrnm commented Jun 27, 2024

Thanks for engaging in this conversation with me @codefromthecrypt! You made a few really good points, I appreciate that!

Note, that the purpose of this issue was to request a process, and less about the specifics of the terminology. Per discussion with the @open-telemetry/technical-committee and @open-telemetry/governance-committee these terms should live in the Spec Glossary and we (@open-telemetry/docs-approvers) will raise issues/PRs against the spec to discuss them.

I will close this issue for now, but I will raise follow up issues (soon!) to discuss this further, I will tag you (@codefromthecrypt) accordingly.

Thanks!

@codefromthecrypt
Copy link
Contributor

thanks for clarifying, I did indeed lose track of this being focused on process not specific outcomes.

@svrnm
Copy link
Member Author

svrnm commented Jun 28, 2024

thanks for clarifying, I did indeed lose track of this being focused on process not specific outcomes.

me too 😆 ... it's hard to distinguish the two!

@tiffany76
Copy link

Hi @svrnm, I know I'm late to the party and you closed this issue, but I had a quick thought about the process of encouraging the use of the consistent terminology. You said:

A key term is binding across the OpenTelemetry project, @open-telemetry/docs-approvers will make sure that it is used consistently on the opentelemetry.io website and SIG maintainers are responsible to know about them and use them accordingly as well.

As a new approver, may I suggest that each SIG adds a step to its onboarding process for new approvers to review and become familiar with the spec and community glossaries? We can act as a first line of defense against unsanctioned terminology in PRs and hopefully cut down on the enforcement burden for maintainers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants