Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed functionality v0.2 #1264

Open
trisch-me opened this issue Jul 18, 2024 · 31 comments
Open

Embed functionality v0.2 #1264

trisch-me opened this issue Jul 18, 2024 · 31 comments
Assignees

Comments

@trisch-me
Copy link
Contributor

trisch-me commented Jul 18, 2024

After several discussions in comments and during our semconv and tooling meetings we came to conclusion to rewrite embed functionality to make embed more prominent and visible because it has potential to influence and change not only the schema but also other parts of Open Telemetry such as APIs etc.

So the proposal is to make an embed a part of the group as opposed to the part of attributes before. Both embedded namespaces and embed attributes will be visible in separate block, so the developer would know that those are treated differently

groups:
   - id: registry.client
     prefix: client
     type: attribute_group
     brief: "Describes client"

     embed:
     - attribute_groups: 
        - ref: geo
          as: location (if omitted, the name is equal as namespace prefix, in this example geo)
          brief: "My own embedded geo space"

          attributes: 
          - ref: geo.lat (override only those attributes from the group which needs to be overridden. 
                             Other attributes are added AS IS)
            brief: "This attribute is required"


      - attributes: 
         - ref: geo.lat
           as: my.latitude
           brief: "My own embedded latitude"
         - ref: geo.lon

Embedded namespaces

  • Embedded namespaces will be listed under attribute_groups . They might have an optional parameter as. If omitted it is equal to the embedded namespace, for example resolved schema will be client.geo.*. If provided, the name of the embedded namespace will be equal to the value of as, such as client.location.* where * has all geo attributes.
  • Embedded namespace might have it's own brief
  • If there is a need to override specific attributes within embedded namespace, it will be possible to do so by listing those attributes under attributes block within attribute_groups. The ref attribute is mandatory, brief is optional. If omitted, the original values will be used. In example above geo.lat will have a different brief.
  • Example of embedded namespaces where as is needed, resolved schema will be process.parent.*:
groups:
   - id: registry.process
     prefix: process
     type: attribute_group
     brief: "Describes process"

     embed:
     - attribute_groups: 
        - ref: process
          as: parent
          requirement_level: Required
          brief: "Parent of the process is itself a process namespace"

          attributes: 
          - ref: process.id
            requirement_level: Required
            brief: "This attribute is required, all other process attributes are optional"

Embedded singular attributes

  • Embedded singular attributes will be listed under attributes. They might have an optional parameter as. If omitted, name is equal to the embedded attribute, for example resolved schema will be client.geo.lat. If provided, the name of the embedded attribute will be equal to the value of as, such as client.my.latitude where my.latitude is the alias for geo.lat.
  • Embedded singular attributes might have it's own brief.
  • Example of embedded singular attributes which resolves to file.owner and file.creator. It might be important if embedded attribute has some restrictions or requirements, so we don't redefine them everywhere multiple times but only reference them in embed section:
groups:
   - id: registry.file
     prefix: file
     type: attribute_group
     brief: "Describes file"

     embed:
     - attributes: 
         - ref: user.name <-- embedded attribute
           as: owner
           brief: "Owner of the file"
         - ref: user.name <-- embedded attribute
           as: creator
           brief: "Creator of the file"

Questions:

  • should we restrict embed to only registry?

implementation details

  • resolution of embedded namespaces most probably will be similar to the extend keyword.
  • resolution of embedded attributes will be similar to the ref keyword.
  • refs inside embed have different meaning than refs from normal attributes. I could call them ids as well, but I think ref is more descriptive because essentially this is a reference to another existing attributes_groups/attributes.

This issue is replacing open-telemetry/build-tools#240 due to number of comments there which makes it a bit harder to follow up on final solution. I will update this message if there will be changes required.

@trisch-me trisch-me self-assigned this Jul 18, 2024
@trisch-me
Copy link
Contributor Author

@jsuereth @lmolkova @lquerel please take a look. I might have missed something but wanted to have discussion started so I will start with implementation sooner

@trisch-me trisch-me moved this to In Progress in Security SemConv Jul 19, 2024
@lmolkova
Copy link
Contributor

lmolkova commented Jul 23, 2024

Thank you!

Some questions/comments:

  1. what would be the scenario for as (i.e. embedding geo as location) ? Do we really need it? Would it replace the first segment or all segments?
    One of the reasons for embeddings is to have reusable concepts and renaming attributes would defeat this purpose. I'd prefer to avoid having this feature until we know if/why it's needed.

  2. embed the whole namespace: I don't see any candidate for embed-namespace today and see plenty of cases for embedding a single attribute. By embedding the whole namespace:

    • I'm auto-subscribing to all future changes in that namespace whether they are applicable or useful in my namespace or not.
    • I'm defaulting to generic attribute descriptions and notes for attributes I don't mention explicitly. It might be ok for experimental specs, but really any high-quality semconv should update briefs/notes and specify what this attribute means in the context of this semconv.

    E.g. geo defines a lot of things such as continent name and also continent code, same with url or user - one might be applicable in one place, but not in the other. I believe explicitly declaring each attribute to be embedded would result in clearer spec and fewer attributes we don't have a use-case for.

  3. If we don't need embed-all, things for common case (embed-a-few) become much easier:

      groups:
       - id: registry.foo
         prefix: foo
         ....
    
         attributes:
           - embed: user.id # or `embed_one`
             brief: Foo-specific definition of user.id
           - id: bar # regular declaration   

    It we need to introduce embed-all, it can still be done with

      groups:
        - id: registry.foo
          prefix: foo
          ....
          attributes:
            - embed_namespaces: [user, url] # you can't redefine briefs/notes/etc when embedding all. If you want to redefine, import attributes one-by-one
            - id: bar # regular declaration   

    or

      groups:
        - id: registry.foo
          prefix: foo
          ....
          embed_namespaces: [user, url] # you can't redefine briefs/notes/etc
          attributes:
            - id: bar # regular declaration   

    It seems the limitation of not being able to update briefs/etc of individual attributes when importing namespace is inevitable
    (i.e. you either need to mention every attribute explicitly or the fact that all unmentioned attributes are embedded too becomes implicit).

  4. Re "should we restrict embed to only registry?"

    Embedding (in my view) is a new attribute definition (with some restrictions - same type and name suffix). It can only be done in the registry. Also, there should be no requirement level properties as we don't provide requirement levels in the registry.

    Somebody should be able to reference such attributes in a non-registry group, e.g.

    groups:
      - id: registry.foo
        prefix: foo
        ....
    
       attributes:
         - embed: user.id # or `embed_one`
           brief: Foo-specific definition of user.id
         - id: bar # regular declaration   
    
      - id: traces.foo
        ....
    
        attributes:
          - ref: foo.user.id
            requirement_level: required  

TL;DR: I think semantic conventions have to explicitly mention all attributes they want to embed and need to override notes/briefs to explain what these attributes mean in the context of that group. If we wanted to reuse attributes as is, we could always reference/extend a group instead of embedding it.

I believe blindly embedding all is dangerous and would result in a lot of unused/misused attributes.

I.e. I don't think embedding a namespace is something we should implement and without it we can design things to be much easier and more readable.

@trisch-me
Copy link
Contributor Author

what would be the scenario for as (i.e. embedding geo as location) ? Do we really need it? Would it replace the first segment or all segments?

it's described in the proposal. process.parent field is of type process itself.

I don't see any candidate for embed-namespace today

We do have geo namespace waiting for this feature to be implemented in order to be used inside client and server namespace. As well as mentioned process.parent. And those are not only usecases, but most prominent and needed right now

If we don't need embed-all...

We do need embed all, this is the main reason for this functionality. Embedding of dedicated attributes was just a nice add-on for this feature.

@MadVikingGod
Copy link
Contributor

For the single attributes we already have ref and after first look it seems the only feature added here is the as field. Could this rename feature be strong enough on its own to be added as a feature to the current attribute model, e.g. add as to all attributes?

@trisch-me
Copy link
Contributor Author

@MadVikingGod ref has another meaning. Embed not only reference attribute but also adds groups prefix to it.
So during embedding of geo.lat into client the resolved attribute name will be client.geo.lat while during referencing it it will stay the same geo.lat

@trisch-me
Copy link
Contributor Author

To reiterate some points from todays discussion here in written form:

  • Embedding the whole namespace is a nice way of embedding all the fields from a namespace without a need to reiterate them one by one
  • Separate attributes from the embedded namespace could be overwritten, for example briefs, examples etc, see original proposal
  • We could add a new property and define some namespaces to be embeddable to allow only embedding of the namespaces which represent some sort of entity in it. For example user or geo or process are good namespaces for such case where http is rather bag of attributes
  • Having as field, i.e. possibility to rename referenced attribute seems to be an interesting one even without context of embed functionality

On additional note I feel also we are mixing things within the registry and outside of it.

As of now embed will be used only to define things within the registry, for example client.geo.* means that client might use all the fields from geo. But we will use those fields outside of the registry through the usual ref, it means one can use only those attributes we have defined already in registry, maybe all of them, maybe only some.

It means one might use only ref: client.geo.lat or only ref: process.parent.id in yaml definitions outside of the registry. I think this might answer some questions @lmolkova had.

We might also decide to use embed outside of the registry but this might bring even more confusion so I propose to refrain from it until we will have a use case.

Let's continue a discussion here to find the solution @jsuereth @lmolkova @joaopgrassi @lquerel

@lmolkova
Copy link
Contributor

lmolkova commented Jul 29, 2024

Extracting the main requirement from the comment #1264 (comment)

we should not define attributes that are not going to be used. In other words any changes to embeddable namespace need to be compatible to and meaningful in all the groups that embed them.

I don't see good examples in existing semconv where we have namespaces satisfying those requirements. Our existing approach is to "describe everything about {namespace}". To leverage typed embedding, we'd need to change how we structure things.

E.g.

  • geo.description (not yet in semcomv) is a free-form text that would not be applicable in many cases, there are other attributes that seems to be dependent on the context, see Add geo fields into attribute registry #1116
  • we have some process attributes that don't seem to be relevant on process.parent (e.g. process.context_switch_type)
  • user.full_name is not applicable in most cases (e.g. browser telemetry)
  • we could make thread embeddable (it only has name and id), but it'd mean that all future attributes we ever add to the thread would be as generic. It probably means there will be none (a few).

The ways I see that can satisfy these requirements:

  1. do not allow namespace embedding (i.e. typed flattened structure represented with a set of attributes).
  2. use namespace embedding only for small, well-scoped, complete, clear definitions of something
  3. maybe it's possible to leverage requirement levels and say that certain properties of embeddable group are optional and are not embedded

The solution I'm suggesting (p1) is a two-way door. It does not block p2 in the future.

I'm suggesting the following litmus test before we introduce p2:

  • we have an example of embeddable namespace
  • we manually (copy-paste or embed single) embed it all

I.e. I want to have a real example where we embed the whole namespace which is merged and part of the semconv first before we introduce p2.

@trisch-me
Copy link
Contributor Author

I think we need to continue discussion on embedded namespaces but I would like to start with embedded attributes because I think there is no strong objection against it.

I have a following usecase - an application needs to access foo.csv and may write a log describing this interaction to a log file which is called something else, say application.log

Let's discuss 2 options to describe the embed attributes

A. inside the group on the top level 🎉

groups:
   - id: registry.log
     prefix: log
     type: attribute_group
     brief: "Describes log file"

     embed:
        - attributes: <-- embedded attributes
          - ref: file.name
            as: file.name <-- this is can/should be omitted because it's the same name as ref
            brief: 'Filename of the log file'
            example: 'application.log'
          - ref: file.name
            as: event.file.name
            brief: "The file that the action was performed on."
            example: 'foo.csv'

    attributes:  <-- normal attributes
      - id: record.uid ...

resolving into

log.file.name - Filename of the log file
log.event.file.name - The file that the action was performed on.
log.record.uid - ...

another example with process

groups:
   - id: registry.process
     prefix: process
     type: attribute_group
     brief: "Describes process"

     embed:
        - attributes: <-- embedded attributes
          - ref: process.id
            as: parent.id
            brief: 'Id of parent process also know as ppid'
            example: '123'

    attributes:  <-- normal attributes
      - id: id
        brief: Process id
        note: Additional info <-- this will be used for both parent and process id

resolving into

process.id - Process id
process.parent.id - Id of parent process also know as ppid

Pros:

  • This option makes embed attributes visible and prominent
  • Makes it easier later to expand embed to support not only attributes but also namespaces

Cons:

  • it feels a bit unnatural to have some attributes defined above in embed section and some inside normal attributes section

B. inside the attributes section 🚀

Resolutions for these examples will be the same as above

groups:
   - id: registry.log
     prefix: log
     type: attribute_group
     brief: "Describes log file"

    attributes:  <-- attributes section
      - id: record.uid <-- usual attribute
      - embed: file.name <-- embedded
        as: file.name <-- this is can/should be omitted because it's the same name as ref
        brief: 'Filename of the log file'
        example: 'application.log'
      - embed: file.name
        as: event.file.name
        brief: "The file that the action was performed on."
        example: 'foo.csv'
groups:
   - id: registry.process
     prefix: process
     type: attribute_group
     brief: "Describes process"

    attributes:  <-- attributes section
      - id: id
        brief: Process id
        note: Additional info <-- this will be used for both parent and process id
      - embed: process.id
        as: parent.id
        brief: 'Id of parent process also know as ppid'
        example: '123'

Pros:

  • On the same level as other attributes
  • Simpler structure - just embed keyword

Cons:

  • Need to put attention to embed attributes as they could be mixed with other id attributes
  • Makes it harder later to have unified structure together with embedded namespaces

You can simply vote for the proposed option through emoji or put your alternatives in comments.

@lmolkova
Copy link
Contributor

If we don't pursue embedding namespaces, then option B seems super-clean and I'd be happy with it. I still have concerns about as and would suggest to remove it from the initial scope.

I believe the reason to consider option A as to make embedding namespaces and attributes consistent with each other. It'd be hard to decide on a single attributes without having some basic design for namespaces.

@lmolkova
Copy link
Contributor

lmolkova commented Jul 29, 2024

Need to put attention to embed attributes as they could be mixed with other id attributes

could you elaborate? what's the problem?

One suggestion is to do

prefix: client
attributes:
  - embed: client.geo.location.longitute
    from: geo.location.longitute
    brief: ...

(parent is tricky - renaming attributes as a part of embedding seems controversial, so make be we can take it separately?).

We can validate that client.geo.location.longitute is a prefix + from. This approach (using fully qualified attribute name in the embed) would also help with issues we already have today with id and prefix #1292

@lmolkova
Copy link
Contributor

lmolkova commented Jul 29, 2024

the middle-ground:

prefix: foo
embeded_namespaces:
  - id: foo.file # the name of resulting namespace to generate
    from: file # either a group id or a whole namespace which can in theory be defined across multiple groups
    attributes: 
      - embed: foo.file.name
        # we can try to figure it out, but it's better to be explicit and 
        # then we can validate if it's the same as `embed` without `prefix`
        from: file.name 
        ...
      ...
embeded_attributes: # we can as well do it inside regular `attributes`, I have no strong opinion here
  - embed: foo.geo.location.longitute
    from: geo.location.longitute
    brief: ...

produces:

  • foo.file.name + all other attributes in file with foo prefix.
  • foo.geo.location.longitute

Pros:

  • defining embedded attribute is the same regardless if it's embedded as a single thing or as a part of namespace. I.e. easier to learn, implement and maintain than two different things.
  • the name of the new attribute is explicit. if we need a thing like as, the name of the new attribute is there, you don't need to guess how as works and what it'd produce.
  • we don't abuse existing concepts (attribute ref, id) that are different from embedding
  • we can evolve these features independently

Cons

  • ?

@trisch-me
Copy link
Contributor Author

I still have #1264 (comment) and would suggest to remove it from the initial scope.

We can't remove it from initial scope, we need to differentiate different embeddings of the same object within the same namespace. I have provided 2 examples for this case, how do you propose to implement them without using alias?

Need to put attention to embed attributes as they could be mixed with other id attributes
could you elaborate? what's the problem?

This was a request from @jsuereth, he wanted to have embed attributes separate because of their meaning. I don't have strong opinion on it, I'm ok with both options, where A is a bit better in the long run, and B is better for only attributes.

I am ok with the idea of using fully qualified name in embed as well. It aligns with idea to be explicit

In your example I see almost the same structure that was already proposed but with different names, let me rename it for the same group.

groups:
   - id: registry.process
     prefix: process
     type: attribute_group
     brief: "Describes process"

    attributes:  <-- attributes section
      - id: id
        brief: Process id
        note: Additional info
      - embed: process.parent.id
        from: process.id
        brief: 'Id of parent process also know as ppid'
        example: '123'

But in this example I feel that as gives better meaning

groups:
   - id: registry.process
     prefix: process
     type: attribute_group
     brief: "Describes process"

    attributes:  <-- attributes section
      - id: id
        brief: Process id
        note: Additional info <-- this will be used for both parent and process id
      - embed: process.id
        as: process.parent.id <-- I have added here prefix for better clarity
        brief: 'Id of parent process also know as ppid'
        example: '123'

@jsuereth
Copy link
Contributor

jsuereth commented Jul 30, 2024

I'd like to add consideration for the inverse relationship for embed.

E.g. Instead of embed living on the consumer, we do something like:

groups:
  id: registry.types.geo
  type: embeddable_group
  brief: 'The set of attributes which describe a geographcial location'
  prefix: geo
  attributes:
     - id: lat
        ...
     - id: lon
       ....

And then used like the following:

groups:
  id: some_user_of_this
  type: attribtue_group
  prefix: some_user_of_this
  attributes:
    - ref_group: registry.types.geo
      name: location

Leads to to the attributes some_user_of_this.location.lat and some_user_of_this.location.lon.

@trisch-me
Copy link
Contributor Author

@jsuereth in your example do you want to emphasise a need for embedded groups to have a special type i.e. embeddable_group?

Because otherwise your example looks pretty similar to what I propose, just different keywords ref_group for embed and name for as

And we also can have instead of new type a new property allow_embedding: true

@lmolkova
Copy link
Contributor

lmolkova commented Jul 30, 2024

We can't remove it from initial scope, we need to differentiate different embeddings of the same object within the same namespace. I have provided 2 examples for this case, how do you propose to implement them without using alias?

we don't before we discuss it in addition to embedding. I.e. it's a feature on top, not a core part.

  - embed: process.id
   as: process.parent.id <-- I have added here prefix for better clarity
   brief: 'Id of parent process also know as ppid'

I'd like to have full qualified name in the declaration. In this case, as should be in the declaration.

- as: process.parent.id 
  from: parent.id

If you're ok renaming, we can combine things into

groups:
  - id: registry.types.geo.location
    type: embeddable_attribute_group
    brief: 'The set of attributes which describe a geographcial location'
    prefix: geo.location
    attributes:
       - id: lat
          ...
       - id: lon
         ....

  - id: registry.foo
    prefix: foo
    embeded_namespaces:
      - as: foo.geo.location # id seems ok here too
        from: embeddable_attribute_group
        attributes: 
          - as: foo.geo.location.lon
            from: geo.location.lon
            ...
   # or
    embeded_attributes: 
      - as: foo.geo.location.lon
        from: geo.location.lon
        brief: ...
      - as: foo.geo.location.lat
        from: geo.location.lat
        brief: ...

Any objections?

@trisch-me
Copy link
Contributor Author

Oh I see your point, as with id declaration you want embed also to have full name in the beginning to emphasise it? I think it's ok too.

So you propose then to have different groups to support embedded namespaces and embedded attributes. Do you think it gives better clarity to the reader then to have embed block with namespaces and attributes inside?

Can I say that we agree on moving on with embedded attributes and we just discussing syntax?

I see a couple of options proposed here:

    embeded_attributes: 
      - as: process.parent.id
        from: process.id
        brief: ...
    attributes:
      - id: ...
    embeded_attributes: 
      - id: client.geo.lon
        from: geo.lon
        brief: ...
    attributes:
      - id: ...
    embeded_attributes: 
      - name: client.geo.lon
        from: geo.lon
        brief: ...
    attributes:
      - id: ...

I feel that when using it in other yaml files outside of registry id seems to be better option because it is similar to what we have used before with ref

groups:
  id: mygroup
  type: attribute_group
  brief: ...
  attributes:
    - ref: process.id <------------} both names are taken from `id` declaration. One from normal attributes
    - ref: process.parent.id < -----}                     another from embedded

@lmolkova
Copy link
Contributor

lmolkova commented Jul 30, 2024

So you propose then to have different groups to support embedded namespaces and embedded attributes.

no strong opinion here, but I'd like to minimize nesting for what I consider to be the common case.

groups:
  - id: registry.foo
    prefix: foo
    embed:
      namespaces:
        - as: foo.geo.location # id seems ok here too
          from: embeddable_attribute_group
          attributes: 
            - as: foo.geo.location.lon
              from: geo.location.lon
              ...
      attributes: 
        - as: foo.geo.location.lon
          from: geo.location.lon
          brief: ...
        - as: foo.geo.location.lat
          from: geo.location.lat
          brief: ...

seems fine too.

I see a couple of options proposed here:

do you mean as vs id vs name? I feel strongly against id (or ref) because it would conflict with existing id on attributes.
name looks great.

Can I say that we agree on moving on with embedded attributes and we just discussing syntax?

Let's figure out the syntax.
Also, once we finalize the syntax, I still suggest to not go forward with namespaces right away - see #1264 (comment). We shoud be able to add it incrementally after we have a namespace that's embeddable.

@trisch-me
Copy link
Contributor Author

Yes, I'm ok with moving only with embed attributes for now and discuss/implement namespaces later.

do you mean as vs id vs name? I feel strongly against id (or ref) because it would conflict with existing id on attributes.

I thought a comment from your side - as: foo.geo.location # id seems ok here too means you propose to have here id instead of as

I do like id, because it is the definition of the attributes inside the embed but other than this it should be very similar to id in normal attributes. So while referencing later this attribute a developer can just take always id. I'm ok with name as well, but id seems to be more applicable here.

@lmolkova
Copy link
Contributor

attribute:id means attribute definition. Someone can define a type for this attribute.
Embedding an attribute means defining an attribute, but reusing some properties of another attribute (type, stability?, deprecation?, part of the name?).

I'm strongly opposed to reuse id to embed an attribute, in the same way as I opposed to reuse ref for it.
My comment on allowing id was for the namespace, not attribute. We can polish the details for the namespace later once we get to it. It seems we have a general design figured out.

@trisch-me
Copy link
Contributor Author

Before moving to final syntax I have one issue with current approach I want to discuss: alias to the attribute i.e. process.parent.id for process.id should be possible, but will be most probably rare case.
With current approach we will need to define it every time which leads to repetition i.e. geo.lon will be used as geo.lon. I know we try to be explicit, just want to raise this concern.

Besides of it we have now 2 options discussed above for external structure

groups:
  - id: registry.foo
    embedded_namespaces:
    ....
    embeded_attributes: 
    ...

vs

groups:
  - id: registry.foo
    embed:
      namespaces:
      ....
      attributes: 
      ....

And 2 options for internal structure of attributes:

groups:
  - id: registry.foo
    embeded_attributes: 
      - as: process.parent.id
        from: process.id
        brief: ...
groups:
  - id: registry.foo
    embeded_attributes: 
      - name: process.parent.id
        from: process.id
        brief: ...

@MadVikingGod
Copy link
Contributor

@MadVikingGod ref has another meaning. Embed not only reference attribute but also adds groups prefix to it. So during embedding of geo.lat into client the resolved attribute name will be client.geo.lat while during referencing it it will stay the same geo.lat

Yes, but what I'm saying is if we were to add as to the current attribute struct, it could interact like so:

groups:
  - id: client
    prefix: client
    attributes:
    - ref: geo.lat
      as: client.geo.lat

which should be rendered as client.geo.lat

That keeps all of the current overrides, and it doesn't need some other field that does the same thing but different.

@trisch-me
Copy link
Contributor Author

@MadVikingGod we already have this option to use ref as embed (I have implemented it in weaver) but this is exactly why we were having new discussion about it. Because semantically reference and embedding are different and they should also be treated differently: not to mix usual ref and embed ref, make embed more visible, currently proposition to have embed not in the regular attributes but on the top level of the group.

This PR is using ref with additional attribute prefix: true and it generates from ref: user.id a fully qualified name process.user.id

@MadVikingGod
Copy link
Contributor

What I'm saying is that attribute embedding, not the type discussion we had in the SIG, seems to me to be a more complicated ref+rename.

It seems like the requirements for renaming fall under:

My point of view is that if the functionality is so close we probably should add the feature to ref not create another new config type in the semconv.

I also think separating attributes into two different areas (attributes and embed.attributes) is also a usable problem, but that's a minor concern.

This analysis is totally different for embedded types/namespaces. For that the concept is so different from referencing an attribute that I think it deserves it's own top level directive.

@trisch-me
Copy link
Contributor Author

@MadVikingGod I understand you and as I told before this was original idea that I have even implemented into weaver, you can use it now in semconv because it's not reverted yet

But this is where @lmolkova requested changes and the whole discussion started from the scratch.

@lquerel
Copy link
Contributor

lquerel commented Aug 12, 2024

@trisch-me Regarding the syntax options, I have a slight preference for embed -> namespaces and attributes. As for as vs name, I don’t really have a preference.

@trisch-me
Copy link
Contributor Author

trisch-me commented Sep 25, 2024

Hey everyone, so here is a short recap of everything written above to better visibility for embed syntax:

groups:
  - id: registry.foo
    embed:
      namespaces:
        - from: process
          as: parent
          brief: "Parent of the process is itself a process namespace"

      attributes: 
        - from: file.name
          as: file.name // can/should be omitted
          brief: "Filename of the log file"
          example: "file.log"
        - from: file.name
          as: event.file.name
          brief: "The file that the action was performed on."
          example: 'foo.csv'
        

I will start most probably with attributes implementation, and integration of embedded namespaces will be the next step.

@trisch-me
Copy link
Contributor Author

Follow up after discussion during last tooling meeting

In our last meeting, we discussed several important concerns, and I would like to address them here in writing:

  • Semantic Meaning: One concern raised was that the same field might carry different semantic meanings across different use cases. To resolve this, I propose to introduce a new name/alias (as) for the field, which will handle its semantics. Additionally, we should allow for updates to brief, example, and possibly others, similar to how we currently handle the ref field. However, other elements like additional notes or restrictions should still be inherited from the original field.

  • Too broad definition of attribute groups: Another concern was the possibility of attribute groups being too large to embed efficiently at the namespace level. We need to limit the number of fields that will be automatically embedded in such cases. I see two potential solutions here:

  1. Add an embeddable field: we could add a new embeddable type to every attribute, with a default value of false. This would specify whether an attribute should be taken out into a new namespace during the embedding of the whole namespace.

  2. Create an embeddable group: alternatively, we could introduce a new type of group in addition to the registry groups called an embeddable/embedded group, which would only contain an extracted set of attributes that are embeddable.

I prefer the first option, as it avoids redundancy in defining the same attributes for both a registry attribute group and an embedded group. It also simplifies the process of adding new embedded attributes — just define them once and toggle whether they should be embedded or not. However, I am open to further suggestions or thoughts on this.

  • Propose new group type: This overlaps with the first option mentioned above.

This discussion relates to how we handle the entire namespace, which is necessary for a unified vision moving forward.

However, I suggest that we start by focusing on embedding single attributes first. This will give us a clearer understanding of which fields should be customizable for embedded attributes, and it will also allow us to finalize the formatting of markdown and yaml files. Once that’s established, introducing namespaces later will simply become a convenient way to embed multiple attributes at once. Semantical meaning will be resolved through the as naming and updated brief, and concerns regarding broad definitions are not applicable for a single attribute embedding.

Are there any additional concerns regarding the embedding of individual attributes? I would like to begin implementation as soon as possible and would appreciate it if we could resolve all open questions in our next meeting.

@jsuereth
Copy link
Contributor

jsuereth commented Nov 8, 2024

Too broad definition of attribute groups

This is why I was suggesting a new group type for embeddable things. I think this gives us a good set of properties:

  • We can enforce different stability requirements on emeddable groups, e.g. we may not want to allow change of any type or treat change as 'higher friction' due to the nature of embedding
  • We can very carefully control which attributes will be duplicate or not across the attribute registry.
  • We can have different codegen for embedded groups. It's possibly/likely we could treat them as a "unit". Embeddable groups can very cleaning be iterated over and have codegen product appropriate adapters, which then can be leveraged when generating keys, spans, metrics etc.

I'm definitely opposed to option (1) of having every attribute having an embeddable flag. To me this may be simple to implement the markdown generation but complicates codegen. Additionally, I suspect this will be more verbose in the long term. TL;DR; I feel that option may be easy to implement but more limiting in the long run / harder to maintain.

Do you have examples of embed where there wouldn't be multiple attributes embedded for a namespace? (e.g. geo.*) Do we have different views of what the problem is?

@trisch-me
Copy link
Contributor Author

trisch-me commented Nov 20, 2024

@jsuereth yes, I have mentioned one of those examples in my comment above. People should be able to embed any attribute as we do now with ref. I’m not opposed to have attribute groups, we could extract all embeddable fields into it and use them as a group or use just single fields out of it.
Let’s make a final discussion today on this topic

concerns/questions:

  • are we ok with duplication of attributes? Should then embedded groups have only refs inside to the original attribute_group?
  • why do you think it makes codegen complicated?

@jsuereth
Copy link
Contributor

jsuereth commented Dec 3, 2024

Adding to some comments here, due to a conflux of PRs and discussions disparate.

why do you think it makes codegen complicated?

Because when you "reference" individuals, it's hard to treat them as a group, you have to reverse engineer it.

Ideally, e.g. for the geo.location namespace, I'd like to have something like the following in codegen:

package io.opentelemetry.semconv.types.geo;

interface Location {
  /** Latitude of the geo location in [WGS84](https://wikipedia.org/wiki/World_Geodetic_System#WGS84). */
  double getLat();
  /** Longitude of the geo location in [WGS84](https://wikipedia.org/wiki/World_Geodetic_System#WGS84). */
  double getLon();
  
  // Helper method
  static AttributesBuilder setLocation(AttributesBuilder builder, String prefix, Location location) {
    return builder
    .put(prefix + ".location.lat", location.getLat())
    .put(prefix + ".location.lon", location.getLon());
  }
}

Then when we generate events, e.g. we could have something like:

interface LoginEventBuilder {
  LoginEventBuilder setUserId(String id);
  ...
  LoginEventBuilder setLocation(Location location);
  
}

This is really hard to do if you have to infer possible types from signal usage. E.g. I would need to know that login events use both geo.location.lat and geo.location.lon to know that I should generate a Location type. However if it's a first class group, we can have dedicated templates to generate them. Additionally when generating Span/Metric/Event/Resource signal code, we can leverage the fact that reference is to a group to generate appropriate code.

If we erase/flatten, all of this becomes a huge exercise in rebuilding the semantics in codegen.

@trisch-me
Copy link
Contributor Author

trisch-me commented Dec 4, 2024

If I correctly understand your concerns, the main feature we need to have is possibility to connect all attributes of the same embed instance together, especially if we have more than one instance of the same embed group such as client.get.* and server.geo.*. While those might be visible from visual representation when we resolve the schema that connection would be lost and we can’t rely on prefix in order to make sure that particular pair of lat and lon is connected.

I had a talk with @lquerel regarding possible solutions we might have in weaver to make possible a connection between fields of the same embed instance.

So far we had following idea:

  1. We introduce a new type, embedded group, which would be most probably a template type
  2. This group should have an attribute type - this is a formal definition of a group to understand that attributes of this groups are not just usual attributes but inferred attributes. I think we can just use the namespace name as a type, in our example above the type = geo.
  3. This group should have a hidden attribute embed_name - this would be autogenerated by weaver while resolving the schema. I propose to have name = prefix. In example above it would be name=client or name=server

How it will look like:

original attributes group

groups:
  - id: registry.geo <-- normal group
    attributes:
      - id: lat
         brief: “Latitude of the geo location"
      - id: lon
         brief: “Longitude of the geo location”
      - id: postal_code
         brief: “Postal code of the location”

embedded group with only attributes we are allow to embed

groups:
  - id: embed.geo
     attributes:
        - ref: geo.lat
        - ref: geo.lon
groups:
  - id: registry.client
    embed:
      namespaces:
        - from: embed.geo
          as: location
   attributes:
      - id: address

...

  - id: registry.server
    embed:
      namespaces:
        - from: embed.geo
          as: geo
    attributes:
      - id: address

Resolved schema:

groups:
- id: client
  type: attribute_group
  brief: |
    General client attributes.
  attributes:
  - name: client.address
    ....
  - name: client.location.lon
    type: embed.geo
    embed_name: client
    brief: Longitude of the geo location

  - name: client.location.lat
    type: embed.geo
    embed_name: client
    brief: Latitude of the geo location

  lineage:
     ...

- id: server
  type: attribute_group
  brief: |
    General server attributes.
  attributes:
  - name: server.address
    ....
  - name: server.geo.lon
    type: embed.geo
    embed_name: server
    brief: Longitude of the geo location

  - name: server.geo.lat
    type: embed.geo
    embed_name: server
    brief: Latitude of the geo location

  lineage:
     ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

6 participants