-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a section in Components about "agents" & what to use instead of "agent"? #1689
Comments
@martinjt have a look here :-) |
Here is my suggestion:
|
I am ok with avoiding the word "agent", but I don't see "auto-instrumentation" being the exact equivalent for that today:
To make a long story short and in combination with your ask at @ #1777, what we need to avoid confusion with the otel end-users, is the following:
What I don't know is how to go about that, is this something the GC/TC needs to pick up, do we need a spec change, an OTEP? |
Just a nit about .NET - it's explicitly not an agent (@pellared can confirm) but rather a custom profiler & assorted components that injects the instrumentation. So we'd want to use a different name for that. |
I would stand that it is simply "automatic-instrumentation" Take notice that for Python auto-instrumentation you do not need to write code as you can use env vars to configure stuff (see here). Maybe the same is true for Node.js and Ruby? It is possible that the automatic instrumentation term is abused there. but it is something that should be addressed in their projects. I like the current description that I see here and here. Probably this issue can be closed. |
In this blog post says "now with the .NET Automatic Instrumentation project, developers can run the agent side-by-side with their application and get telemetry automatically. We confuse our end-users, and that's why I raised this issue. This adds to the constant complaint of end-users that it is so complicated to get started with OpenTelemetry. And yes, the term automatic instrumentation is abused in some places, but my point of view is that this comes from the fact that there is no better term for it, if we avoid "agent". My question is, how do we call the piece of software that is attached to an application (ideally without touching the code) and then doing the following as an all-in-one solution:
|
I see your concern... But right now I have no good answer (nor opinion) for it 😢
"auto-instrumentation" (sic!)? I mean like it is hard to create a hard-boundaries here as it is very language/technology specific how and what is getting automatically instrumented. Some of the things you mentioned are provided by "auto-instrumentation", some by the SDKs, and it depends on the implementation. I think it is even possible that the auto-instrumentation has different defaults than the SDK. My only idea is "auto-instrumentation distribution". I do not think it violates the term described here. However, I am not sure if it will not bring even more confusion... |
Makes us two :D
I hear you, but I am not happy with that: based on the conversation we have, I learned that "auto-instrumentation" is well-defined in the spec as telemetry collection methods that do not require the end-user to modify application's source code, so it's only about the part where instrumentation libraries are injected into the service without touching the code. I like that very much.
Yes, I think that's probably what's the term I am looking for, I mean it follows the description of Distribution from what we have already and python is doing exactly that. Also, doing some checks on vendors I see many having a " Distribution of OpenTelemetry (for) Language" and that includes the things I called out above, so maybe it's something like
Is this the right direction??? |
I don't have full commentary available for this yet, but I wanted to briefly discuss the 'distribution' concept and why it's been bothering me. We spent a significant amount of time in the planning, ideation, and initial implementation of OpenTelemetry to preserve the OpenTracing goal of decomposition between interface and implementation. The goal behind this was to reduce the chance that a full-SDK wrap would be used for end-user implementations of OpenTelemetry. This, however, isn't necessarily how things have passed. I think we need to find wording that talks about extensions to OpenTelemetry, and perhaps that can be the sword that cuts this Gordian knot. If we assume that the OpenTelemetry API and SDK are, respectively, the interface and the implementation, then how do we classify instrumentation libraries today? We don't do a great job at it, we toss them in contrib and colloquially refer to them as 'instrumentation libraries'. This is a distinction without difference; The API itself is the 'instrumentation library' as it's the interface to the instrumentation methods that produce telemetry data. My suggestion is that we classify components somewhat as such:
|
The term "agent" is already super confusing https://en.wikipedia.org/wiki/Software_agent, I hope we can avoid adding more confusions. |
The more I think about that topic, the more I get back to thinking that "agent" is a good term:
If we decide to not use it, we need to provide something alternative to describe that bundle of "things" |
To clarify that a little bit, what's the difference for you between AGENT and DISTRIBUTION? |
A third one on "auto-instrumentation" based on our slack discussion at #otel-comms yesterday & this existing discussion on that topic via @cartermp. I start to understand that using the term "auto-instrumentation" for different things is OK, because it's a broad term that's just saying "I didn't instrument that manually". As @cartermp said: [...] the presence of many different ways to get some degree of instrumentation created for you makes it better to just call it “automatic instrumentation”. Since there’s several mechanisms by which you can get this instrumentation, some offering more instrumentation than others, it’s important to distinguish those mechanisms, but I don’t think we should be in the business of only calling something automatic instrumentation if it comes from some “agent-like” thingy (which we do today) So, you can say "I auto-instrumented my application with an agent" or "I auto-instrumented my dependencies by loading instrumentation libraries" or "I use library x which has otel natively, so I get instrumentation automatically", etc. What we (aka the docs team) need to do eventually, is making sure that when an end user reads on "automatic instrumentation" for Java, for Python, for Ruby, for Node.JS they should be presented with something that gives them details on the different "automatic instrumentations":
If this makes sense, I would like to gap this out from this discussion into a separate issue and we can keep the discussion here going on agents, instrumentation extensions & distributions :D |
Writing my thoughts before I forget. I think that also each "auto-instrumentation" tool/agent/library/whatever should provide a high-level description of what this auto-instrumentation is and how it works. Here is how we try to do it for .NET. |
The primary distinction between an agent and a distribution in my mind is who is providing it. I would suggest that the OpenTelemetry project will never release a distribution of OpenTelemetry. However, I would speculate that eventually a company like GCP or AWS will release a distribution of OpenTelemetry for their clouds. Similarly, I could see point monitoring solutions creating a distribution of OpenTelemetry for their tool, or server less frameworks, etc. etc. An agent, meanwhile, could be distributed by OpenTelemetry or third parties; The distinction there is that an agent must provide instrumentation of a system or service with no code changes.
As someone that originally was a strong proponent of grouping agents and instrumentation extensions into the catch-call term of 'automatic instrumentation', my mind has been changed by speaking to some users who are very confused by our usage of the word. Effectively, the "auto-instrumentation == instrumentation libraries == agents" comes from the concept that using a noun to describe a verb is sus. However, this isn't something that's readily apparent to the lay community. People hear 'auto instrumentation' as a noun; It describes a class of thing, not an action.
Instrumentation library is problematic as a class descriptor for libraries that perform automatic instrumentation because OpenTelemetry SDK is also an instrumentation library. To illustrate this point, I'm going to write the same statement twice but drop proper nouns in the second.
See the problem? If we rework this, though...
|
@svrnm I don't see how your examples of Java, Ruby, Python and Javascript are different meanings for "automatic instrumentation". All of them require no code added for a particular instrumentation, the only difference I see is (maybe, this isn't clear) whether the SDK requires code changes to be initialized or not? (And yes, please do not use "agent" for anything :) |
This is how I understand the main difference. It should be possible to set up without making ANY changes in the application code 😉 E.g. I would say that JavaScript has instrumentation libraries (but no auto-instrumentation). I would say that Python has auto-instrumentation. Ruby at first glance looks like library instrumentation. |
@austinlparker thanks for the clarification on AGENT vs DISTRIBUTION, makes sense to me, a few comments nevertheless:
Those exist already right? We have ADOT from amazon and a bunch of vendors having their "distribution"
Agreed.
Now I get it, and I agree with that one as well. I see 2 problems:
Similar to what @austinlparker said, they are all "automatic instrumentation", they actually do not have different meaning, but the difference in mechanisms (using @cartermp's word here:) ) is still relevant to the end user. At the end we want them to understand that they have multiple ways of accomplishing "auto instrumentation", and all of them are valid based on where they are coming from:
If I understand you correctly, you would say, that this would mean that "auto-instrumentation" is only given, if no code is touched at all? Which comes down to "auto instrumentation == agent". I have my issues with that, for the same reasons what @austinlparker said above, saying/writing things can get confusing:
Sorry for this flippant comment: give me a better word and I am going to use it -- that has been the purpose of this ticket all along, right? And "auto instrumentation" is not a good replacement: seeing the discussion we have on that word, I think agent is the lesser evil of two. (A combination of both "auto instrumentation agent" is something I use alot lately as well) I understand that agent as a term is ambiguous (and disliked by many in the otel community), but I tried it for a year now within my circles to steer people away from "agent" and everybody (including myself) gets back to using that word, because (a) we do not have a better word and (b) it has been around for 15+ years now, used by APM vendors (AppDynamics, Dynatrace, NewRelic ...) & Oss projects (SkyWalking, PinPoint) a like. Changing that in the mind of the end-users is a gigantic task. Note: There are some vendors using different terms, because the reserve agent for that out-of-process piece (DataDog with tracers, instana sensors if I checked both correctly), so those are alternatives, we just have to agree on one :-) (I don't like tracer & sensor for a variety of reasons...) |
I am missing something. Where auto-instrumentation is used as a verb? For me it is a noun. Personally, I understand auto-instrumentation as "a method of getting the application instrumented without touching the application's source code". Using a .NET Profiler is a method to instrument C# apps, using a JVM Agent is a method to instrument Java apps, using eBPF uprobes is a method to instrument Go apps. The agent suggests that it is a "separate process". I personally think that e.g. providing a compiler that would build an application with instrumentation is also auto-instrumentation. Would you call it an agent? EDIT: I do not have a better word than "auto-instrumentation". I think we should clarify the word and maybe give more examples in the definition. I already tried it (open-telemetry/opentelemetry-specification#2700), but maybe someone else could do it a lot better than me 😉 EDIT 2: Another try open-telemetry/opentelemetry-specification#2853 😄 PS. OTel Collector is an agent 😄 |
Agent is overloaded (as "component"), old, and not great, but it does the job, in my opinion. It also bears nasty negative SecOps connotations, but everyone seems to understand what an agent is in the context of automatic instrumentation. I've asked users and they understood what agent implied. So I wrote this definition for the Splexicon:
On the other hand, auto-instrumentation is long, not entirely true, and has a hyphen that causes lots of trouble to documentarians. I'd stick with agent whenever we're talking about automatic or semi-automatic layers that help applying instrumentation to software. Another option is going the AWS Lambda route and call them "Layers". But it'll take years of promotion and user education to achieve a change like that. Back to the original question: if we don't have an alternative to agent or layer, the problem cannot be solved right now and we should continue with what we have, that is, agent or layer. |
I agree that instrumentation achieved via byte code instrumentation, monkey-patching, ebpf uprobes or any similar means are "auto-instrumentation", for me it's the overarching term, but for what I want a term is that layer of software that is applied at runtime to code so that instrumentation (+everything else) is accomplished for me automatically.
I disagree, a JVM Agent for example never has been a "separate process".
No. Here's where I would like to use agent, for an in-process at-runtime code-changing layer that is injected into your application without touching your code yourself -> the agent is acting on your behave to accomplish auto instrumentation.
I am fine with "auto-instrumentation" as overarching term, what I want to have is saying that there are different building blocks/mechanisms and one of them is an "agent".
👍
Yes, but a different kind of agent;-) |
💯
That's not a bug, it's a feature: agents ARE a security issues and I had many customers who were not keen on having self-installing self-updating agents ...
👍
I think that's what @austinlparker meant when he said "a noun vs a verb": The agent is doing auto-instrumentation, aka the agent is the acting noun, the auto-instrumentation is the thing the noun is doing (the verb), even if it's not used as a verb in that sentence.
And layer is close to "thing"...
:-) |
How do we describe the following scenario:
Many existing solutions call Process B "agent", if we call the "auto-instrumentation mechanism" and "agent", it'll be very confusing. In addition, if the auto-instrumentation technology is something like code weaving, do we still call it "agent"?
|
That may be unique to the JVM. I always think of an agent as a separate process -- though I may be the "unique" one :) |
@tslougher You're definitely not the "unique" one, I also feel that JVM Agent is a special case as I've suggested here #1689 (comment) |
@reyang @tsloughter I suspect we might be falling into a "curse of knowledge" issue here, in that "agent" has very specific meanings for certain software development environments or languages. But so do other terms, like "component" or "method" or "layer". We have to compromise a bit. The diagram by @reyang is a good starting point for a language conversation. In my opinion, it's not a big issue to have more than one agent in the picture: one is an APM agent, the other is an infrastructure / forwarding agent. Pardon the frivolity here, but: Let there be agents. :-) The picture is actually quite related to @svrnm 's excellent remark, which is a benefit of using the word agent:
|
@tsloughter you are unique 🤩 but not because you're thinking that "agent" has to be a separate process. I read & hear this a lot, although I disagree. I used the JVM Agent just as an example. It's common among APM vendors to call the ".NET Profiler" or the equivalent in-process layer doing instrumentation for Python, PHP, Ruby, Node.JS an "agent" as well. Thanks @theletterf for bringing up that point that there can be many agents (like on that picture) and this has been the case in monitoring forever (Infra Agents, Forwarding Agents, Synthetic Agents, Browser Agents, APM Agents, Log Collection Agent, etc.) I am OK with getting rid of the word "agent", if we have an alternative word for a software layer that is injected into an application to modify code at runtime to accomplish not only auto-instrumentation, but also initialization, exporting, runtime configuration, self-telemetry & some more. I am happy to brainstorm on that (some candidates so far are "auto-instrumentation", "profiler", "tracer" and "sensor"), but until then I will stick with agent :-) The purpose of this issue is that, I want to write a few words in the documentation for end-users that look for their Java,.NET,Python,Ruby,PHP,Node.JS Agent, to say "there is no agent, but there is X" Right now I need to write: Hey dear APM user, who is used to "throw an agent against an application and what falls down into your backend are traces&metrics", in OpenTelemetry we don't have an agent for all the languages, what we have is an JVM Agent for Java, a CLR Profiler for .NET, a python tool called "opentelemetry-instrument", a tracing.js for nodejs you have to cobble together yourself, etc. |
Entering brainstorming mode... @svrnm Some alternatives to agent that might fit your description:
|
I think that thing is called auto-instrumentation? Here is my simple proposal: Hey dear APM user, if you want to instrument your applications without having to manually instrument your code, use the OpenTelemetry auto-instrumentation. It seems multiple implementation SIGs already kind of chose to use it in the repo URI:
|
I think we are nowhere near changing the name to "agent". Still, this issue uncovers some problems that we currently have (as a whole community). One of the problems is that the repository names for automatic instrumentation have I also want to point out that the term agent is not even mentioned in the spec's glossary. If we want to change the terminology then the Specification SIG is the place to do proceed. Take notice that some OTel components like https://opentelemetry.io/docs/instrumentation/java/ and https://opentelemetry.io/docs/collector/ and indeed agents and it explained in the description. As a community, we should also pay more attention to properly naming things according to our defined and agreed terminology. For example, if someone says something about OTel .NET Agent, then we must tell that there is no such thing. We should describe that in OTel we use the term "auto-instrumentation" to stress that it is not a standalone process but a way of instrumenting the application without touching the app's source code. At last, the word "auto-instrumentation" is more precisely defined than "agent". One of the goals of OpenTelemetry is to establish standards that help in communication. For me naming "agent" would not help that in long term. |
I can live with that
Agreed
💯 That's the thing we are circling right now! And I called it out before, that different projects use the term "auto instrumentation" for things that are similar but not the same:
Adding to that: there is no consistency what you can find in an opentelemetry--<core|contrib|instrumentation> repository
Again, I agree that "agent" is a shitty term, but I disagree that "auto-instrumentation" is more precisely defined. Copying a little bit from what I wrote above: does auto instrumentation only mean instrumentation of my code via instrumentation libraries. Or, does it also include do-not-touch-my-code for everything else what the end-user needs (SDK initialization, exporter setup, sampling setup, resource detection, runtime configuration, self-telemetry, extensions, control plane client? Right now we have no common ground for that. I see two options now:
|
We try to be clear that this isn't auto-instrumentation with the following setence:
Maybe it should explicitly say it is "not auto-instrumentation". |
@svrnm Based on what you described Ruby io docs should be changed from "Automatic" to "Libraries" (or at least "Instrumentation"). @open-telemetry/ruby-maintainers Do you agree? 👆 @svrnm Regarding @svrnm Regarding
I think all current (Java, .NET, Python) auto-instrumentation does all of that, right? |
You found the name "instrumentation libraries bundle" 🎉 |
@tsloughter I don't think that you need to change anything in the erlang doc here, I just raised it in comparison to ruby/nodejs.
From and end-user perspective I find this problematic, people are looking for "Automatic", and would not expect what they are looking for in a page called "Libraries" ... "Instrumentation" might work (think we have it with some languages) but it's still confusing because we also have "Manual Instrumentation". So
We should at least raise it with the communities.
Java, .NET: yes |
the initial purpose of this ticket is accomplished, the concepts now mention that someone who is looking for an apm agent should look for Automatic Instrumentation, the rest of the discussion remains with the spec issue (2866) |
This is a follow-up to @trask's question (#1661):
Sub-projects like opentelemetry-java-instrumentation, opentelemetry-dotnet-instrumentation & python's opentelemetry-distro provide a solution that is described as "layer that adds OpenTelemetry instrumentation to a service without modifying the source code for that service." in OTEPS-0001.
I would like to add a section to Components that calls out the existence of those, what value they bring to the end-user ("zero source code modification.", auto instrumentation for all my libraries, packaging of SDK, exporters, resource detectors & other building blocks, extensibility & some), when to use them and when to go for a manual instrumentation, etc.
Coming from APM I would like to call them "Agents" but as stated in the OTEP "that term is overloaded and ambiguous", so throughout our docs we try to avoid it so far. Im okay with that, but to add that section to the docs, I first need a canonical term that describes that layer properly. There are a few alternatives throughout the docs & code:
<Language>
(plus Javaagent of course)<Language>
Automatic Instrumentation<Language>
based on what is described here: DistributionsSo, how is that "layer that adds OpenTelemetry instrumentation to a service without modifying the source code for that service." called?
cc @open-telemetry/docs-approvers
The text was updated successfully, but these errors were encountered: