-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding engine definition under resources semantic conventions #1293
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "engine" is a bit arbitrary. IMHO what we are really missing is some concept of listing technologies used in a process and their relationships to each other (who runs on top of what, etc).
|
||
A resource can be attributed to at most one engine. To illustrate, let's look at a Python application using Apache HTTP Server with mod_wsgi as the server and Django as the web framework. In this case: | ||
|
||
* Apache HTTP Server would be set as `process.runtime` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the process.runtime would describe the Python interpreter. Runtimes are always "language runtimes". We have no way to describe Apache here. Or maybe Apache would be the engine?
What about adding (parallel) array attributes e.g. process.runtime_stack.*? Then you would have the first entry for apache, the second for mod_wsgi, the 3rd for Python (Python runs inside mod_wsgi which runs on Apache).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I agree with @Oberon00 about process.runtime
and that engine
will probably be Apache HTTP if we want to be strict. If we want to be lenient, then both HTTP server and mod_wsgi
can be described as engine.
Second, as usual :), I propose to keep scope small. Can runtime_stack
be a separate discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if that should be a separate discussion. If we can define engine
precise and useful, then we can discuss it separately. But I wonder if it is possible to find a good and reasonably unambiguous definition. This "engine" stuff is inherently a list of technologies that can run on top or maybe sometimes alongside each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"list of technologies" is too vague IMO. I mean, in case of Spring MVC application using Jackson Json parser, running on Tomcat inside JVM, what will you include in that list? Everything? Some part? How to decide?
In case of java this "engine" is very simple: it is application server and similar things, a la servlet containers. It may be "arbitrary", but this is exactly the information that we want to collect because we think it is useful.
During las specification meeting there was general agreement that this kind of information is relevant. Perhaps people from other language SIGs may provide rules specific for their language ecosystem? Or say that they are not interested in that.
And again, perfection is the enemy of the good :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in this case it would be better to use the term app_server
instead of engine
? I feel like this PR should fix #1143. I see now that this issue was indeed created by the same author as this PR.
@ivomagi please link related issues in your PR description, otherwise we might end up discussing the same things twice 😃 In this case, you can maybe even use it in the Fixes #
line you left empty so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was explicitly decided that we don't want any "map"-like structures in the data model for traces.
@Oberon00 I believe back then we decided that we don't want them "for now" unless there is a clear need. We may need to revisit this decision if the data at the source clearly dictates/requires more complex data structure. We may not necessarily need to do it now, but let's keep the possibility open.
-
What is the reason we are aiming for an array of value? Do we believe that the concept that we want to express is multi-dimensional with variable number of dimensions and thus it is not possible to capture with a fixed set of attributes?
How about using the approach we use elsewhere, e.g. use attributesengine.java.name
,engine.java.version
. If there is a second engine dimensionfoo
then its name should be inengine.foo.name
attribute. If the reader needs to find all engines they can enumerate allengine.*.name
attributes.
However, this approach is only necessary if we need to record multiple engines simultaneously and I did not see examples of that in this PR.
Finally, I assume there is never a need to record more than one name of an engine of the same dimension (i.e. you never have 2 Java engines simultaneously in one resource/span). If there ever is such a need my proposal wouldn't work. -
It may be worth also placing the
engine
in theprocess
namespace. We would then haveprocess.engine.java.name
. This would work unless we think there are engines that are not associated with a process (is there a such a thing?).
P.S. I am not sure "engine" is the right term to use, but I don't have a better suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Oberon00 indeed, this workaround was what I was aiming to fall back for but wanted to verify with smarter guys whether or not we do have the "map-like" structures or not.
- The array of values is not a "must have", it was just a solution I was able to come up with. The fixed set of attributes as you refer might have some issues though. Lets look at the same example used in original pull request which in time was designed with just a single engine in mind (but it seems that the community looks at things differently and multiple engines might be required). Lets see a Python application using Apache HTTP Server with mod_wsgi as the server and Django as the web framework. Now all of these three could be set as engines and all of these seem to want to reside under
engine.python
namespace. - In here I do not have a bias. Maybe more eyes with more experience will share thoughts about whether the
process
namespace would be better or worse location.
And about the term, indeed, naming stuff is hard. Open for suggestions but all the alternatives I was able to come up with (appserver, container, framework, library, ...) were either used for different purpose or felt overloaded or vague.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The array of values is not a "must have", it was just a solution I was able to come up with. The fixed set of attributes as you refer might have some issues though. Lets look at the same example used in original pull request which in time was designed with just a single engine in mind (but it seems that the community looks at things differently and multiple engines might be required). Lets see a Python application using Apache HTTP Server with mod_wsgi as the server and Django as the web framework. Now all of these three could be set as engines and all of these seem to want to reside under engine.python namespace.
We could do this:
apache.version=0.1.2
apache.mod_wsgi.version=1.2.3
python.django.version=2.3.4
I know this means we cannot enumerate engines without knowing what attributes to look for. Is it a requirement to be able to enumerate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being able to enumerate will definitely make life at backend easier, if one would like to list all engines from a trace (which sounds like a reasonable wish).
Having engine name as a part of a path in the tree - if an engine reports itself via its own public API which instrumentation can use like Apache Tomcat
or Eclipse Jetty
. How that should be encoded then? I think that having a name, which is potentially returned by 3rd party and can consist of anything is better suited to be represented as a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vovencij OK, if that's a requirement then what I suggested won't work.
We either do what @Oberon00 proposed earlier with parallel arrays or with using array indices in attributes names (but neither looks nice to me) or we use an array of maps, which requires us to define how such arrays are translated to formats that are not capable of representing such data natively.
I do not have a strong opinion about what approach to use and I do not know of any better alternate.
required: always | ||
brief: > | ||
The name of the engine. | ||
examples: ['FildFly'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
examples: ['FildFly'] | |
examples: ['WildFly'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice this was marked as resolved but not applied. Intentional?
|
||
A resource can be attributed to at most one engine. To illustrate, let's look at a Python application using Apache HTTP Server with mod_wsgi as the server and Django as the web framework. In this case: | ||
|
||
* Apache HTTP Server would be set as `process.runtime` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, I agree with @Oberon00 about process.runtime
and that engine
will probably be Apache HTTP if we want to be strict. If we want to be lenient, then both HTTP server and mod_wsgi
can be described as engine.
Second, as usual :), I propose to keep scope small. Can runtime_stack
be a separate discussion?
|
||
| Name | `engine.name` | | ||
|---|---| | ||
| Apache Tomcat | tomcat | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we set tomcat if using a spring-boot application which starts up embedded tomcat? I guess not since the user isn't even really aware of the version of tomcat, and it's sort of a spring-boot implementation detail (in such a case, it's spring-boot doing classloader trickery to e.g., load embedded JARs, and tomcat doesn't do anything).
If this case doesn't apply, then I'm not sure if ServletContext.getServerInfo()
will be a precise check, and we may need more description about the caveats with regard to embedded container runtimes (effectively just libraries, not app servers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in case of embedded Tomcat/Jetty/Undertow we still want to report it.
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Co-authored-by: Nikita Salnikov-Tarnovski <[email protected]>
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Was autoclosed due to holidays. Reopening for discussion. |
Honestly, I think we are making this PR harder/deeper than it should be. As both #1143 and #1143 (comment) say, there is very useful debugging information that current telemetry lacks. The exact meaning of that information may be language specific and I don't think specification really has to dictate it. E.g. the aforementioned example of Django application running on Apache httpd via This PR, at least in my eyes, says very simple thing: if you (application operator or auto-instrumentation) want to provide this runtime information, this is semantic attribute for it. Which leaves two question:
The "single value or array" question, IMO, depends solely on our ability to have good structure for that array. I don't like two parallel arrays for names and versions. Map-like attribute would be the simplest option. |
@iNikem I agree that structurally map-like attribute would be best. The problem is that unlike OTLP most other telemetry formats are unable to represent such data. What are exporters supposed to do when they see such data? Do we want to define how map data is flattened/converted to fit the limitations of e.g. Jaeger's attributes type system? |
Also, the feature-frozen tracing API does not support such attributes, so it would not be possible to use such a semantic convention with current OpenTelemetry. |
@Oberon00 to be fair, it is possible to extend the API in backwards compatible manner to allow this, so in theory from API's perspective it is doable (though it is more work and won't be in the GA). However, I don't know how we will solve nicely the exporting of such data to non-OTLP formats. We could double down on what the spec already suggests for homogeneous arrays and make the same recommendation for maps. To remind, here is what the spec says today:
JSON obviously would also work for maps. But this is a slippery slope. We would be essentially making JSON attributes widespread and while for simple arrays it is very human-readable and probably does not require any special handling in the backend, the more complicated data structures we allow the more there will be a need for backends to be able to deal with such data as structures rather than as simple string. |
I propose to have this attribute as single valued. The majority of useful information proposed by @Oberon00 as "technologies" above can already be obtained from span's I propose to further stress that exact choice of what to report in this attribute (if any) is language specific and is NOT specification's concern. As name I like If @Oberon00 and @tigrannajaryan agree, I can work with @ivomagi on updating this PR. |
@Oberon00 , @tigrannajaryan do you agree with my proposal above? |
@iNikem I do not understand the problem well enough to agree. I do not object with going forward with the approach that makes best sense to you. Feel free to disregard my suggestions. I was only trying to show some alternates that could potentially help, but apparently they are not a good git. |
Relaxed the semantics about allowed engines and changed the wording allowing instrumentation library authors to make this decision themselves.
removed trailing spaces for markdownlint to pass
why? What is that dotnet specific attribute is better than |
The |
Python could use this to differentiate between web servers like uWSGI, Gunicorn, etc but I don't think |
I can't think of anything that'd be a better fit for Python at least other than |
@ivomagi Would using |
|
Webengine is the name of some libraries that render HTML (e.g. Qt). I would still OK with it, but I would like webserver better.
But isn't engine even more overloaded? It seems to me that webserver is exactly what this attribute seems to want to describe? |
Would you call WebLogic server a "webserver"? |
In the sense of this attribute, yes, close enough 😃 |
All right, let's go with |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@carlosalberto @bogdandrutu - updated the PR to webengine. There seems to be one requested change from Bogdan, other than this we should be good to merge? |
I think we can dismiss @bogdandrutu's review as we the diligence was done (let me know otherwise). |
Indeed, it seems we captured this feedback, but the review requesting changes is still out there and blocking the merge, so how should we proceed - @carlosalberto @bogdandrutu |
Feedback has been gathered/addressed.
@bogdandrutu Dismissed your review - let me know if you think something needs to be addressed still, and I will work on a follow up ;) |
Fixes #1143
Changes
Splunk will be adding auto-instrumentation support to various Java application servers (Tomcat, Jetty, JBoss, Weblogic, Websphere, …) during the forthcoming months. During 2021, this support will be expanded for both Java application servers as well as engines for other runtimes besides JVM.
While implementing the support, we discovered that as of now there is no semantic convention about sending the information covering “this span was captured from Apache Tomcat version 9.0.39”.
Capturing and exposing this information would result in:
With other APM vendors also capturing & exposing this information; standardizing it in OpenTelemetry is a way to guarantee vendor-compatibility.
Related issues #
Related oteps #