-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed functionality v0.2 #1264
Comments
Thank you! Some questions/comments:
TL;DR: I think semantic conventions have to explicitly mention all attributes they want to embed and need to override notes/briefs to explain what these attributes mean in the context of that group. If we wanted to reuse attributes as is, we could always reference/extend a group instead of embedding it. I believe blindly embedding all is dangerous and would result in a lot of unused/misused attributes. I.e. I don't think embedding a namespace is something we should implement and without it we can design things to be much easier and more readable. |
it's described in the proposal.
We do have
We do need embed all, this is the main reason for this functionality. Embedding of dedicated attributes was just a nice add-on for this feature. |
For the single attributes we already have |
@MadVikingGod |
To reiterate some points from todays discussion here in written form:
On additional note I feel also we are mixing things within the registry and outside of it. As of now embed will be used only to define things within the registry, for example It means one might use only We might also decide to use Let's continue a discussion here to find the solution @jsuereth @lmolkova @joaopgrassi @lquerel |
Extracting the main requirement from the comment #1264 (comment) we should not define attributes that are not going to be used. In other words any changes to embeddable namespace need to be compatible to and meaningful in all the groups that embed them. I don't see good examples in existing semconv where we have namespaces satisfying those requirements. Our existing approach is to "describe everything about {namespace}". To leverage typed embedding, we'd need to change how we structure things. E.g.
The ways I see that can satisfy these requirements:
The solution I'm suggesting (p1) is a two-way door. It does not block p2 in the future. I'm suggesting the following litmus test before we introduce p2:
I.e. I want to have a real example where we embed the whole namespace which is merged and part of the semconv first before we introduce p2. |
I think we need to continue discussion on embedded namespaces but I would like to start with embedded attributes because I think there is no strong objection against it. I have a following usecase - Let's discuss 2 options to describe the embed attributes A. inside the group on the top level 🎉groups:
- id: registry.log
prefix: log
type: attribute_group
brief: "Describes log file"
embed:
- attributes: <-- embedded attributes
- ref: file.name
as: file.name <-- this is can/should be omitted because it's the same name as ref
brief: 'Filename of the log file'
example: 'application.log'
- ref: file.name
as: event.file.name
brief: "The file that the action was performed on."
example: 'foo.csv'
attributes: <-- normal attributes
- id: record.uid ... resolving into
another example with process groups:
- id: registry.process
prefix: process
type: attribute_group
brief: "Describes process"
embed:
- attributes: <-- embedded attributes
- ref: process.id
as: parent.id
brief: 'Id of parent process also know as ppid'
example: '123'
attributes: <-- normal attributes
- id: id
brief: Process id
note: Additional info <-- this will be used for both parent and process id resolving into
Pros:
Cons:
B. inside the attributes section 🚀Resolutions for these examples will be the same as above groups:
- id: registry.log
prefix: log
type: attribute_group
brief: "Describes log file"
attributes: <-- attributes section
- id: record.uid <-- usual attribute
- embed: file.name <-- embedded
as: file.name <-- this is can/should be omitted because it's the same name as ref
brief: 'Filename of the log file'
example: 'application.log'
- embed: file.name
as: event.file.name
brief: "The file that the action was performed on."
example: 'foo.csv' groups:
- id: registry.process
prefix: process
type: attribute_group
brief: "Describes process"
attributes: <-- attributes section
- id: id
brief: Process id
note: Additional info <-- this will be used for both parent and process id
- embed: process.id
as: parent.id
brief: 'Id of parent process also know as ppid'
example: '123' Pros:
Cons:
You can simply vote for the proposed option through emoji or put your alternatives in comments. |
If we don't pursue embedding namespaces, then option B seems super-clean and I'd be happy with it. I still have concerns about I believe the reason to consider option A as to make embedding namespaces and attributes consistent with each other. It'd be hard to decide on a single attributes without having some basic design for namespaces. |
could you elaborate? what's the problem? One suggestion is to do prefix: client
attributes:
- embed: client.geo.location.longitute
from: geo.location.longitute
brief: ... (parent is tricky - renaming attributes as a part of embedding seems controversial, so make be we can take it separately?). We can validate that |
the middle-ground: prefix: foo
embeded_namespaces:
- id: foo.file # the name of resulting namespace to generate
from: file # either a group id or a whole namespace which can in theory be defined across multiple groups
attributes:
- embed: foo.file.name
# we can try to figure it out, but it's better to be explicit and
# then we can validate if it's the same as `embed` without `prefix`
from: file.name
...
...
embeded_attributes: # we can as well do it inside regular `attributes`, I have no strong opinion here
- embed: foo.geo.location.longitute
from: geo.location.longitute
brief: ... produces:
Pros:
Cons
|
We can't remove it from initial scope, we need to differentiate different embeddings of the same object within the same namespace. I have provided 2 examples for this case, how do you propose to implement them without using alias?
This was a request from @jsuereth, he wanted to have embed attributes separate because of their meaning. I don't have strong opinion on it, I'm ok with both options, where A is a bit better in the long run, and B is better for only attributes. I am ok with the idea of using fully qualified name in embed as well. It aligns with idea to be explicit In your example I see almost the same structure that was already proposed but with different names, let me rename it for the same group.
But in this example I feel that
|
I'd like to add consideration for the inverse relationship for embed. E.g. Instead of embed living on the consumer, we do something like: groups:
id: registry.types.geo
type: embeddable_group
brief: 'The set of attributes which describe a geographcial location'
prefix: geo
attributes:
- id: lat
...
- id: lon
.... And then used like the following: groups:
id: some_user_of_this
type: attribtue_group
prefix: some_user_of_this
attributes:
- ref_group: registry.types.geo
name: location Leads to to the attributes |
@jsuereth in your example do you want to emphasise a need for embedded groups to have a special type i.e. Because otherwise your example looks pretty similar to what I propose, just different keywords And we also can have instead of new type a new property |
we don't before we discuss it in addition to embedding. I.e. it's a feature on top, not a core part.
I'd like to have full qualified name in the declaration. In this case, - as: process.parent.id
from: parent.id If you're ok renaming, we can combine things into groups:
- id: registry.types.geo.location
type: embeddable_attribute_group
brief: 'The set of attributes which describe a geographcial location'
prefix: geo.location
attributes:
- id: lat
...
- id: lon
....
- id: registry.foo
prefix: foo
embeded_namespaces:
- as: foo.geo.location # id seems ok here too
from: embeddable_attribute_group
attributes:
- as: foo.geo.location.lon
from: geo.location.lon
...
# or
embeded_attributes:
- as: foo.geo.location.lon
from: geo.location.lon
brief: ...
- as: foo.geo.location.lat
from: geo.location.lat
brief: ... Any objections? |
Oh I see your point, as with So you propose then to have different groups to support embedded namespaces and embedded attributes. Do you think it gives better clarity to the reader then to have Can I say that we agree on moving on with embedded attributes and we just discussing syntax? I see a couple of options proposed here: embeded_attributes:
- as: process.parent.id
from: process.id
brief: ...
attributes:
- id: ... embeded_attributes:
- id: client.geo.lon
from: geo.lon
brief: ...
attributes:
- id: ... embeded_attributes:
- name: client.geo.lon
from: geo.lon
brief: ...
attributes:
- id: ... I feel that when using it in other yaml files outside of registry groups:
id: mygroup
type: attribute_group
brief: ...
attributes:
- ref: process.id <------------} both names are taken from `id` declaration. One from normal attributes
- ref: process.parent.id < -----} another from embedded |
no strong opinion here, but I'd like to minimize nesting for what I consider to be the common case. groups:
- id: registry.foo
prefix: foo
embed:
namespaces:
- as: foo.geo.location # id seems ok here too
from: embeddable_attribute_group
attributes:
- as: foo.geo.location.lon
from: geo.location.lon
...
attributes:
- as: foo.geo.location.lon
from: geo.location.lon
brief: ...
- as: foo.geo.location.lat
from: geo.location.lat
brief: ... seems fine too.
do you mean
Let's figure out the syntax. |
Yes, I'm ok with moving only with embed attributes for now and discuss/implement namespaces later.
I thought a comment from your side I do like |
I'm strongly opposed to reuse |
Before moving to final syntax I have one issue with current approach I want to discuss: alias to the attribute i.e. Besides of it we have now 2 options discussed above for external structure groups:
- id: registry.foo
embedded_namespaces:
....
embeded_attributes:
... vs groups:
- id: registry.foo
embed:
namespaces:
....
attributes:
.... And 2 options for internal structure of attributes: groups:
- id: registry.foo
embeded_attributes:
- as: process.parent.id
from: process.id
brief: ... groups:
- id: registry.foo
embeded_attributes:
- name: process.parent.id
from: process.id
brief: ... |
Yes, but what I'm saying is if we were to add groups:
- id: client
prefix: client
attributes:
- ref: geo.lat
as: client.geo.lat which should be rendered as That keeps all of the current overrides, and it doesn't need some other field that does the same thing but different. |
@MadVikingGod we already have this option to use This PR is using ref with additional attribute |
What I'm saying is that attribute embedding, not the type discussion we had in the SIG, seems to me to be a more complicated ref+rename. It seems like the requirements for renaming fall under:
My point of view is that if the functionality is so close we probably should add the feature to ref not create another new config type in the semconv. I also think separating attributes into two different areas (attributes and embed.attributes) is also a usable problem, but that's a minor concern. This analysis is totally different for embedded types/namespaces. For that the concept is so different from referencing an attribute that I think it deserves it's own top level directive. |
@MadVikingGod I understand you and as I told before this was original idea that I have even implemented into weaver, you can use it now in semconv because it's not reverted yet But this is where @lmolkova requested changes and the whole discussion started from the scratch. |
@trisch-me Regarding the syntax options, I have a slight preference for |
Hey everyone, so here is a short recap of everything written above to better visibility for embed syntax: groups:
- id: registry.foo
embed:
namespaces:
- from: process
as: parent
brief: "Parent of the process is itself a process namespace"
attributes:
- from: file.name
as: file.name // can/should be omitted
brief: "Filename of the log file"
example: "file.log"
- from: file.name
as: event.file.name
brief: "The file that the action was performed on."
example: 'foo.csv'
I will start most probably with attributes implementation, and integration of embedded namespaces will be the next step. |
Follow up after discussion during last tooling meeting In our last meeting, we discussed several important concerns, and I would like to address them here in writing:
I prefer the first option, as it avoids redundancy in defining the same attributes for both a registry attribute group and an embedded group. It also simplifies the process of adding new embedded attributes — just define them once and toggle whether they should be embedded or not. However, I am open to further suggestions or thoughts on this.
This discussion relates to how we handle the entire namespace, which is necessary for a unified vision moving forward. However, I suggest that we start by focusing on embedding single attributes first. This will give us a clearer understanding of which fields should be customizable for embedded attributes, and it will also allow us to finalize the formatting of markdown and yaml files. Once that’s established, introducing namespaces later will simply become a convenient way to embed multiple attributes at once. Semantical meaning will be resolved through the Are there any additional concerns regarding the embedding of individual attributes? I would like to begin implementation as soon as possible and would appreciate it if we could resolve all open questions in our next meeting. |
This is why I was suggesting a new group type for embeddable things. I think this gives us a good set of properties:
I'm definitely opposed to option (1) of having every attribute having an Do you have examples of embed where there wouldn't be multiple attributes embedded for a namespace? (e.g. |
@jsuereth yes, I have mentioned one of those examples in my comment above. People should be able to concerns/questions:
|
Adding to some comments here, due to a conflux of PRs and discussions disparate.
Because when you "reference" individuals, it's hard to treat them as a group, you have to reverse engineer it. Ideally, e.g. for the package io.opentelemetry.semconv.types.geo;
interface Location {
/** Latitude of the geo location in [WGS84](https://wikipedia.org/wiki/World_Geodetic_System#WGS84). */
double getLat();
/** Longitude of the geo location in [WGS84](https://wikipedia.org/wiki/World_Geodetic_System#WGS84). */
double getLon();
// Helper method
static AttributesBuilder setLocation(AttributesBuilder builder, String prefix, Location location) {
return builder
.put(prefix + ".location.lat", location.getLat())
.put(prefix + ".location.lon", location.getLon());
}
} Then when we generate events, e.g. we could have something like: interface LoginEventBuilder {
LoginEventBuilder setUserId(String id);
...
LoginEventBuilder setLocation(Location location);
} This is really hard to do if you have to infer possible types from signal usage. E.g. I would need to know that If we erase/flatten, all of this becomes a huge exercise in rebuilding the semantics in codegen. |
If I correctly understand your concerns, the main feature we need to have is possibility to connect all attributes of the same embed instance together, especially if we have more than one instance of the same embed group such as I had a talk with @lquerel regarding possible solutions we might have in weaver to make possible a connection between fields of the same embed instance. So far we had following idea:
How it will look like: original attributes group groups:
- id: registry.geo <-- normal group
attributes:
- id: lat
brief: “Latitude of the geo location"
- id: lon
brief: “Longitude of the geo location”
- id: postal_code
brief: “Postal code of the location” embedded group with only attributes we are allow to embed groups:
- id: embed.geo
attributes:
- ref: geo.lat
- ref: geo.lon
groups:
- id: registry.client
embed:
namespaces:
- from: embed.geo
as: location
attributes:
- id: address
...
- id: registry.server
embed:
namespaces:
- from: embed.geo
as: geo
attributes:
- id: address
Resolved schema: groups:
- id: client
type: attribute_group
brief: |
General client attributes.
attributes:
- name: client.address
....
- name: client.location.lon
type: embed.geo
embed_name: client
brief: Longitude of the geo location
- name: client.location.lat
type: embed.geo
embed_name: client
brief: Latitude of the geo location
lineage:
...
- id: server
type: attribute_group
brief: |
General server attributes.
attributes:
- name: server.address
....
- name: server.geo.lon
type: embed.geo
embed_name: server
brief: Longitude of the geo location
- name: server.geo.lat
type: embed.geo
embed_name: server
brief: Latitude of the geo location
lineage:
... |
After several discussions in comments and during our semconv and tooling meetings we came to conclusion to rewrite embed functionality to make embed more prominent and visible because it has potential to influence and change not only the schema but also other parts of Open Telemetry such as APIs etc.
So the proposal is to make an embed a part of the group as opposed to the part of attributes before. Both embedded namespaces and embed attributes will be visible in separate block, so the developer would know that those are treated differently
Embedded namespaces
attribute_groups
. They might have an optional parameteras
. If omitted it is equal to the embedded namespace, for example resolved schema will beclient.geo.*
. If provided, the name of the embedded namespace will be equal to the value ofas
, such asclient.location.*
where*
has allgeo
attributes.attributes
block withinattribute_groups
. Theref
attribute is mandatory,brief
is optional. If omitted, the original values will be used. In example abovegeo.lat
will have a different brief.as
is needed, resolved schema will beprocess.parent.*
:Embedded singular attributes
attributes
. They might have an optional parameteras
. If omitted, name is equal to the embedded attribute, for example resolved schema will beclient.geo.lat
. If provided, the name of the embedded attribute will be equal to the value ofas
, such asclient.my.latitude
wheremy.latitude
is the alias forgeo.lat
.file.owner
andfile.creator
. It might be important if embedded attribute has some restrictions or requirements, so we don't redefine them everywhere multiple times but only reference them in embed section:Questions:
implementation details
extend
keyword.ref
keyword.ref
s insideembed
have different meaning than refs from normal attributes. I could call themid
s as well, but I thinkref
is more descriptive because essentially this is a reference to another existing attributes_groups/attributes.This issue is replacing open-telemetry/build-tools#240 due to number of comments there which makes it a bit harder to follow up on final solution. I will update this message if there will be changes required.
The text was updated successfully, but these errors were encountered: