From 1eba75ba493c0388802af76fca2b7104ae7aef3b Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 13 Aug 2024 13:17:16 -0400 Subject: [PATCH 01/51] Initial Entity and Resource proposal. --- text/entities/????-resource-and-entities.md | 276 ++++++++++++++++++++ 1 file changed, 276 insertions(+) create mode 100644 text/entities/????-resource-and-entities.md diff --git a/text/entities/????-resource-and-entities.md b/text/entities/????-resource-and-entities.md new file mode 100644 index 000000000..d75336f8f --- /dev/null +++ b/text/entities/????-resource-and-entities.md @@ -0,0 +1,276 @@ +# Resource and Entities - Data Model Part 2 + +This is a proposal to address Resource and Entity data model interactions, +including a path forward to address immediate friction and issues in the +current resource specification. + + + + + + +## Motivation + +This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: + +- Allowing mutating attributes to participate in Resource ([OTEP 208](https://github.com/open-telemetry/oteps/pull/208)). +- Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](https://github.com/open-telemetry/oteps/pull/208)). +- Provide support for async resource lookup ([spec#952](https://github.com/open-telemetry/opentelemetry-specification/issues/952)). +- Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](https://github.com/open-telemetry/oteps/pull/208), [spec#3382](https://github.com/open-telemetry/opentelemetry-specification/issues/3382), [spec#3710](https://github.com/open-telemetry/opentelemetry-specification/issues/3710)). +- Allow semantic convention resource modeling to progress ([spec#605](https://github.com/open-telemetry/opentelemetry-specification/issues/605), [spec#559](https://github.com/open-telemetry/opentelemetry-specification/issues/559), etc). + +# Approach - Resource Improvements +Start with outlining Entity detectors and Resource composition. This has a higher priority for fixing within OpenTelemetry, and needs to be unblocked sooner. Infer our way back to data model and Collector use cases. + +We define the following SDK components: + +- **Resource Detectors (legacy)**: We preserve existing resource detectors. They have the same behavior and interfaces as today. +- **Entity Detectors (new)**: Detecting an entity that is relevant to the current instance of the SDK. For example, this would detect a service entity for the current SDK, or its process. Every entity must have some relation to the current SDK. +- **Resource Coordinator (new)**: A component responsible for taking Resource and Entity detectors and doing the following: + - Constructing a Resource for the SDK from detectors. + - Dealing with conflicts between detectors. + - Providing SDK-internal access to detected Resources for reporting via Log signal on configured LogProviders. + - *(new) Managing Entity changes during SDK lifetime, specifically dealing with entities that have lifetimes shorter than the SDK* + +## Resource Container + +The SDK Resource coordinator is responsible for running all configured Resource and Entity Detectors. There will be some (user-controlled, otel default) priority order to these. + +- The resource coordinator will detect conflicts in Entity of the same type being discovered and choose one to use. +- When using Entity Detectors and Resource detectors together, the following merge rules will be used: + - Entity merging will occur first resulting in an "Entity Merged" Resource. + - Entities of different types will be merged into the resulting Resource. + - Entities of the same type will have one rejected and one accepted, based on priority. + - Resource detectors otherwise follow existing merge semantics. + - The Specification merge rules will be updated to account for violations prevalent in ALL implementation of resource detection. + - Specifically: This means the rules around merging Resource across schema-url will be dropped. Instead only conflicting attributes will be dropped. + - SchemaURL on Resource will need to be deprecated with entity-specific schema-url replacing it. Additionally, as no Resource semantic conventions have ever stabilized, SchemaURL usage on Resource cannot be in stable components of OpenTelemetry. Given prevalent violation of implementations around Resource merge specification, we suspect impact of this deprecation to be minimal. + - An OOTB "Env Variable Entity Detector" will be specified and provided vs. requiring SDK wide ENV variables for resource detection. +- *Additionally, Resource Coordinator would be responsible for understanding Entity lifecycle events, for Entities whose lifetimes do not match or exceed the SDK's own lifetime (e.g. browser session).* + +## Entity Detector + +The Entity detector in the SDK is responsible for detecting possible entities that could identify the SDK. For Example, if the SDK is running in a kubernetes pod, it may provide an Entity for that pod. SDK Entity Detectors are only required to provide identifying attributes, but may provide descriptive attributes to ensure combined Resource contains similar attributes as today's SDK. + +An Entity Detector would have an API similar to: + +```rust +trait EntityDetector + pub fn detect_entities(...) -> Result, EntityDetectionError>> +``` + +Where `Result` is the equivalent of error channel in the language of choice (e.g. in Go this would be `entities, err := e.detectEntities()`). + +## Entity Merging and Resource + +The most important aspect of this design is how Entities will be merged to construct a Resource. We provide a simple algorithm for this behavior: + +- Construct a set of detected entities, E +- All entity detectors are sorted by priority +- For each entity detector + - For each entity detected + - If the entity exists in E, ignore it + - Otherwise, add the entity to E +- Construct a Resource from the set E. + +Any implementation that achieves the same result as this algorithm is acceptable. + +## Environment Variable Detector + +An Entity detector will be specified to allow Platform to inject entity identity information into workloads running on that platform. For Example, the OpenTelemetry Operator could inject information about Kubernetes Deployment + Container into the environment, which SDKs can elect to interact with (through configuration of the Environment Variable Entity Detector). + +While details of ENV variables will be subject to change, it would look something like the following: + +```bash +set OTEL_DETECTED_ENTITIES=k8s.deployment[k8s.deployment.name=my-program],k8s.pod[k8s.pod.name=my-program-2314,k8s.namespace=default] + +``` + +The minimum requirements of this entity detector are: + +- ENV variable can specify multiple entities (resource attribute bundles) +- ENV variable can be easily appended or leverages by multiple participating systems, if needed. +- Entities discovered via ENV variable can participate in Resource Manager generically, i.e. resolving conflicting definitions. + +The actual design for this ENV variable interaction would follow the approval of this OTEP. + +## Interactions with OpenTelemetry Collector + +The OpenTelemetry collector can be updated to optionally interact with Entity on Resource. A new entity-focused resource detection process can be created which allows add/override behavior at the entity level, rather than individual attribute level. + +For example, the existing resource detector looks like this: + +```yaml +processors: + resourcedetection/docker: + detectors: [env, docker] + timeout: 2s + override: false +``` + +The future entity-based detector would look almost exactly the same, but interact with the entity model of resource: + +```yaml +processor: + entityresourcedetection: + # Order determines override behavior + detectors: [env, docker] + # False means only append if entity doesn't already exist. + override: false +``` + +The list of detectors is given in priority order (first wins, in event of a tie, outside of override configuration). The processor may need to be updated to allow the override flag to apply to each individual detector. + +# Datamodel Changes + +Given our desired design and algorithms for detecting, merging and manipulating Entities, we need the ability to denote how entity and resource relate. These changes must not break existing usage of Resource, therefore: + +- The Entity model must be *layered on top of* the Resource model. A system does not need to ineract with entities for correct behavior. +- Existing key usage of Resource must remain when using Entities, specifically navigationality (see: [OpenTelemetry Resources: Principles and Characteristics](https://docs.google.com/document/d/1Xd1JP7eNhRpdz1RIBLeA1_4UYPRJaouloAYqldCeNSc/edit)) +- Downstream components should be able to engage with the Entity model in Resource. + +The following changes are made: + +## Resource + +| Field | Type | Description | Changes | +| ----- | ---- | ----------- | ------- | +| schema_url | string | The Schema URL, if known. This is the identifier of the Schema that the resource data is recorded in. This field is deprecated and should no longer be used. | Will be deprecated | +| dropped_attributes_count | integer | dropped_attributes_count is the number of dropped attributes. If the value is 0, then no attributes were dropped. | Unchanged | +| attributes | repeated KeyValue | Set of attributes that describe the resource.

Attribute keys MUST be unique (it is not allowed to have more than one attribute with the same key).| Unchanged | +| entities | repeated ResourceEntityRef | Set of entities that participate in this Resource. | Added | + +The DataModel would ensure that attributes in Resource are produced from both the identifying and descriptive attributes of Entity. This does not mean the protocol needs to transmit duplicate data, that design is TBD. + +## ResourceEntityRef + +The entityref data model, would have the following changes from the original [entity OTEP](https://github.com/open-telemetry/oteps/blob/main/text/entities/0256-entities-data-model.md) to denote references within Resource: + +| Field | Type | Description | Changes | +| ----- | ---- | ----------- | ------- | +| schema_url | string | The Schema URL, if known. This is the identifier of the Schema that the entity data is recorded in. To learn more about Schema URL see https://opentelemetry.io/docs/specs/otel/schemas/#schema-url | added | +| type | string | Defines the type of the entity. MUST not change during the lifetime of the entity. For example: "service" or "host". This field is required and MUST not be empty for valid entities. | unchanged | +| identifying_attributes_keys | repeated string | Attribute Keys that identify the entity.
MUST not change during the lifetime of the entity. The Id must contain at least one attribute.

These keys MUST exists in Resource.attributes.

Follows OpenTelemetry common attribute definition. SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | +| descriptive_attributes_keys | repeated string | Descriptive (non-identifying) attribute keys of the entity.
MAY change over the lifetime of the entity. MAY be empty. These attribute keys are not part of entity's identity.

These keys MUST exist in Resource.attributes.

Follows any value definition in the OpenTelemetry spec - it can be a scalar value, byte array, an array or map of values. Arbitrary deep nesting of values for arrays and maps is allowed.

SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | + +# How this proposal solves the problems that motivated it + +Let's look at some motivating problems from the [Entities Proposal](https://docs.google.com/document/d/1VUdBRInLEhO_0ABAoiLEssB1CQO_IcD5zDnaMEha42w/edit#heading=h.atg5m85uw9w8): + +**Problem 1: Commingling of Entities** + +We embrace the need for commingling entities in Resource and allow downstream users to interact with the individual entities rather than erasing these details. + +**Problem 2: Lack of Precise Identity** + +Identity is now clearly delineated from description via the Entity portion of Resource. When Entity is used for Resource, only identifying attributes need to be interacted with to create resource identity. + +**Problem 3: Lack of Mutable Attributes** + +This proposal offers two solutions going forward to this: + +- Descriptive attributes may be mutated without violating Resource identity +- Entities whose lifetimes do not match SDK may be attached/removed from Resource. + +**Problem 4: Metric Cardinality Problem** + +Via solution to (2) we can leverage an identity synthesized from identifying attributes on Entity. By directly modeling entity lifetimes, we guarantee that identity changes in Resource ONLY occur when source of telemetry changes. This solves unintended metric cardinality problems (while leaving those that are necessary to deal with, e.g. collecting metrics from phones or browser instances where intrinsic cardinality is high). + +## Entity WG Rubric + +The Entities WG came up with a rubric to evaluate solutions based on shared +beliefs and goals for the overall effort. Let's look at how each item is +achieved: + +**Resource detectors (soon to be entity detectors) need to be composable / disjoint** +Entity detection and Resource Manager now fulfill this need. + + +**New entities added by extension should not break existing code** +Users will need to configure a new Entity detector for new entities being modelled. + + +**Navigational attributes need to exist and can be used to identify an entity but could be augmented with UUID or other aspects. - Having ONLY a UUID for entity identification is not good enough.** +Resource will still be composed of identifying and descriptive attributes of Entity, allowing baseline navigational attributes users already expect from resource. + + +**Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets.** +Entity concept provides a new "bundle" mechanism to resource for the Collector to augment enrich a group of attributes and better identify conflicts (or identity changes) caused therein. + + +**Users are expected to provide / prioritize "detectors" and determine which entity is "producing" or most-important for a signal** +The Resource Manager allows users to configure priority of Entity Detectors. + +**For an SDK - ALL telemetry should be associated with the same set of entities (resource labels).** +Resource Manager is responsible for resolving entities into a cohesive Resource that meets the same demands as Resource today. + + +# Open Questions + +The following remain open questions: + +## How to attach Entity "bundle" information in Resource? + +The protocol today requires a raw grab bag of Attributes on Resource. We cannot break this going forward. However, Entities represent a new mechanism of "bundling" attributes on Resource and interacting with these bundles. We do not want this to bloat the protocol, nor do we want it to cause oddities. + +Going forward, we have set of options: + +- Duplicate attributes in `Entity` section of Resource. +- Reference attributes of Resource in entity. +- Only identify Entity id and keep attribute<->entity association out of band. +- Extend Attribute on Resource so that we can track the entity type per Key-Value (across any attribute in OTLP). + +The third option prevents generic code from interacting with Resource and Entity without understanding the model of each. The first keeps all usage of entity simple at the expense of duplicating information and the middle is awkward to interact with from an OTLP usage perspective. The fourth is violates our stability policy for OTLP. + +## How to deal with Resource/Entities whose lifecycle does not match the SDK? + +This proposal motivates a Resource Coordinator in the SDK whose job could include managing changes in entity lifetimes, but does not account for how these changes would be broadcast across TracerProvider, LogProvider, MeterProvider, etc. That would be addressed in a follow on OTEP. + +## How to deal with Prometheus Compatibility for non-SDK telemetry? + +Today, Prometheus compatibility relies on two key attributes in Resource: service.name and service.instance.id. These are not guaranteed to exist outside of OpenTelemetry SDK generation. While this question is not fully answered, we believe outlining identity in all resources within OpenTelemetry allows us to define a solution in the future while preserving compatibility with what works today. + +## Should entities have a domain? + +Is it worth having a `domain` in addition to type for entity? We could force each entity to exist in one domain and leverage domain generically in resource management. Entity Detectors would be responsible for an entire domain, selecting only ONE to apply a resource. Domains could be layered, e.g. a Cloud-specific domain may layer on top of a Kubernetes domain, where "GKE cluster entity" identifies *which* kubernetes cluster a kuberntes infra entity is part of. This layer would be done naively, via automatic join of participating entities or explicit relationships derived from GKE specific hooks. + +It's unclear if this is needed initially, and we believe this could be layered in later. + +## Should resources have only one associated entity? + +Given the problems leading to the Entities working group, and the needs of existing Resource users today, we think it is infeasible and unscalable to limit resource to only one entity. This would place restrictions on modeling Entities that would require OpenTelemetry to be the sole source of entity definitions and hurt building an open and extensible ecosystem. Additionally it would need careful definition of solutions for the following problems/rubrics: + +- New entities added by extension should not break existing code +- Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets. + +## What identity should entities use (LID, UUID / GUID, or other)? + +One of the largest questions in the first entities' OTEP was how to identify an entity. This was an attempt to unify the need for Navigational attributes with the notion that only identifying attributes of Entity would show up in Resource going forward. This restriction is no longer necessary in this proposal and we should reconsider how to model identity for an Entity. + +This can be done in follow up design / OTEPs. + +## What happens if existing Resource translation in the collector remove resource attributes an Entity relies on? + +While we expect the collector to be the first component to start engaging with Entities in an architecture, this could lead to data model violations. We have a few options to deal with this issue: + +- Consider this a bug and warn users not to do it. +- Specify that missing attribute keys are acceptable for descriptive attribtues. +- Specify that missing attribute keys denote that entities are unsuable for that batch of telemetry, and treat the content as malformed. + +# Trade-offs and mitigations + +The design proposed here attempts to balance non-breaking (backwards and forwards compatible) changes with the need to improve problematic issues in the Specification. Given the inability of most SDKs to implement the current Resource merge specification, breaking this should have little effect on actual users. Instead, the proposed merge specification should allow impelmentation to match current behavior and expectation, while evolving for users who engage with the new model. + +# Prior art and alternatives + +Previously, we have a few unaccepted oteps, e.g. ([OTEP 208](https://github.com/open-telemetry/oteps/pull/208)). Additionally, there are some alternatives that were considered in the Entities WG and rejected. + +Below is a brief discussion of some design decisions: + +- **Only associating one enttiy with a Resource.** This was rejected, as too high a friction point in evolving semantic conventions and allowing independent systems to coordinate identity + entities within the OpenTelemetry ecosystem. Eventually, this would force OpenTelemetry to model all possibly entities in the world and understand their interaction or otherwise prevent non-OpenTelemetry instrumentation from interacting with OpenTelemetry entities. +- **Embed fully Entity in Resource.** This was rejected because it makes it easy/trivial for Resource attributes and Entities to diverge. This would prevent the backwards/forwards compatibility goals and also require all participating OTLP users to leverage entities. Entity should be an opt-in / additional feature that may or may not be engaged with, depending on user need. +- **Re-using resource detectoin as-is** This was reject as not having a viable compatibility path forward. Creating a new set of components that can preserve existing behavior while allowing users to adopt the new functionality means that users have better control of when they see / change system behavior, and adoption is more obvious across the ecosystem. + +# Future Posibilities + +This proposal opens the door for addressing issues where an Entity's lifetime does not match an SDK's lifetime, in addition to providing a data model where mutable (descriptive) attributes can be changed over the lifetime of a resource without affecting its idnetity. We expect a follow-on OTEP which directly handles this issue. \ No newline at end of file From fbf4b09896b12e527d2f3a359cab8e0611c9f3c0 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 13 Aug 2024 13:19:41 -0400 Subject: [PATCH 02/51] Rename OTEP to match PR reservation number. --- ...???-resource-and-entities.md => 0264-resource-and-entities.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename text/entities/{????-resource-and-entities.md => 0264-resource-and-entities.md} (100%) diff --git a/text/entities/????-resource-and-entities.md b/text/entities/0264-resource-and-entities.md similarity index 100% rename from text/entities/????-resource-and-entities.md rename to text/entities/0264-resource-and-entities.md From 245b7aa601370b49043070afee763e70a32a6102 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 13 Aug 2024 13:20:24 -0400 Subject: [PATCH 03/51] Spellcheck. --- text/entities/0264-resource-and-entities.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index d75336f8f..9ee408094 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -255,11 +255,11 @@ While we expect the collector to be the first component to start engaging with E - Consider this a bug and warn users not to do it. - Specify that missing attribute keys are acceptable for descriptive attribtues. -- Specify that missing attribute keys denote that entities are unsuable for that batch of telemetry, and treat the content as malformed. +- Specify that missing attribute keys denote that entities are unusable for that batch of telemetry, and treat the content as malformed. # Trade-offs and mitigations -The design proposed here attempts to balance non-breaking (backwards and forwards compatible) changes with the need to improve problematic issues in the Specification. Given the inability of most SDKs to implement the current Resource merge specification, breaking this should have little effect on actual users. Instead, the proposed merge specification should allow impelmentation to match current behavior and expectation, while evolving for users who engage with the new model. +The design proposed here attempts to balance non-breaking (backwards and forwards compatible) changes with the need to improve problematic issues in the Specification. Given the inability of most SDKs to implement the current Resource merge specification, breaking this should have little effect on actual users. Instead, the proposed merge specification should allow implementations to match current behavior and expectation, while evolving for users who engage with the new model. # Prior art and alternatives From 72a67ad92860d970457fdbe9571a7d1222fe370a Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 13 Aug 2024 13:22:13 -0400 Subject: [PATCH 04/51] More spellcheck. --- text/entities/????-resource-and-entities.md | 276 ++++++++++++++++++++ 1 file changed, 276 insertions(+) create mode 100644 text/entities/????-resource-and-entities.md diff --git a/text/entities/????-resource-and-entities.md b/text/entities/????-resource-and-entities.md new file mode 100644 index 000000000..870e303d1 --- /dev/null +++ b/text/entities/????-resource-and-entities.md @@ -0,0 +1,276 @@ +# Resource and Entities - Data Model Part 2 + +This is a proposal to address Resource and Entity data model interactions, +including a path forward to address immediate friction and issues in the +current resource specification. + + + + + + +## Motivation + +This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: + +- Allowing mutating attributes to participate in Resource ([OTEP 208](https://github.com/open-telemetry/oteps/pull/208)). +- Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](https://github.com/open-telemetry/oteps/pull/208)). +- Provide support for async resource lookup ([spec#952](https://github.com/open-telemetry/opentelemetry-specification/issues/952)). +- Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](https://github.com/open-telemetry/oteps/pull/208), [spec#3382](https://github.com/open-telemetry/opentelemetry-specification/issues/3382), [spec#3710](https://github.com/open-telemetry/opentelemetry-specification/issues/3710)). +- Allow semantic convention resource modeling to progress ([spec#605](https://github.com/open-telemetry/opentelemetry-specification/issues/605), [spec#559](https://github.com/open-telemetry/opentelemetry-specification/issues/559), etc). + +# Approach - Resource Improvements +Start with outlining Entity detectors and Resource composition. This has a higher priority for fixing within OpenTelemetry, and needs to be unblocked sooner. Infer our way back to data model and Collector use cases. + +We define the following SDK components: + +- **Resource Detectors (legacy)**: We preserve existing resource detectors. They have the same behavior and interfaces as today. +- **Entity Detectors (new)**: Detecting an entity that is relevant to the current instance of the SDK. For example, this would detect a service entity for the current SDK, or its process. Every entity must have some relation to the current SDK. +- **Resource Coordinator (new)**: A component responsible for taking Resource and Entity detectors and doing the following: + - Constructing a Resource for the SDK from detectors. + - Dealing with conflicts between detectors. + - Providing SDK-internal access to detected Resources for reporting via Log signal on configured LogProviders. + - *(new) Managing Entity changes during SDK lifetime, specifically dealing with entities that have lifetimes shorter than the SDK* + +## Resource Container + +The SDK Resource coordinator is responsible for running all configured Resource and Entity Detectors. There will be some (user-controlled, otel default) priority order to these. + +- The resource coordinator will detect conflicts in Entity of the same type being discovered and choose one to use. +- When using Entity Detectors and Resource detectors together, the following merge rules will be used: + - Entity merging will occur first resulting in an "Entity Merged" Resource. + - Entities of different types will be merged into the resulting Resource. + - Entities of the same type will have one rejected and one accepted, based on priority. + - Resource detectors otherwise follow existing merge semantics. + - The Specification merge rules will be updated to account for violations prevalent in ALL implementation of resource detection. + - Specifically: This means the rules around merging Resource across schema-url will be dropped. Instead only conflicting attributes will be dropped. + - SchemaURL on Resource will need to be deprecated with entity-specific schema-url replacing it. Additionally, as no Resource semantic conventions have ever stabilized, SchemaURL usage on Resource cannot be in stable components of OpenTelemetry. Given prevalent violation of implementations around Resource merge specification, we suspect impact of this deprecation to be minimal. + - An OOTB "Env Variable Entity Detector" will be specified and provided vs. requiring SDK wide ENV variables for resource detection. +- *Additionally, Resource Coordinator would be responsible for understanding Entity lifecycle events, for Entities whose lifetimes do not match or exceed the SDK's own lifetime (e.g. browser session).* + +## Entity Detector + +The Entity detector in the SDK is responsible for detecting possible entities that could identify the SDK. For Example, if the SDK is running in a kubernetes pod, it may provide an Entity for that pod. SDK Entity Detectors are only required to provide identifying attributes, but may provide descriptive attributes to ensure combined Resource contains similar attributes as today's SDK. + +An Entity Detector would have an API similar to: + +```rust +trait EntityDetector + pub fn detect_entities(...) -> Result, EntityDetectionError>> +``` + +Where `Result` is the equivalent of error channel in the language of choice (e.g. in Go this would be `entities, err := e.detectEntities()`). + +## Entity Merging and Resource + +The most important aspect of this design is how Entities will be merged to construct a Resource. We provide a simple algorithm for this behavior: + +- Construct a set of detected entities, E +- All entity detectors are sorted by priority +- For each entity detector + - For each entity detected + - If the entity exists in E, ignore it + - Otherwise, add the entity to E +- Construct a Resource from the set E. + +Any implementation that achieves the same result as this algorithm is acceptable. + +## Environment Variable Detector + +An Entity detector will be specified to allow Platform to inject entity identity information into workloads running on that platform. For Example, the OpenTelemetry Operator could inject information about Kubernetes Deployment + Container into the environment, which SDKs can elect to interact with (through configuration of the Environment Variable Entity Detector). + +While details of ENV variables will be subject to change, it would look something like the following: + +```bash +set OTEL_DETECTED_ENTITIES=k8s.deployment[k8s.deployment.name=my-program],k8s.pod[k8s.pod.name=my-program-2314,k8s.namespace=default] + +``` + +The minimum requirements of this entity detector are: + +- ENV variable can specify multiple entities (resource attribute bundles) +- ENV variable can be easily appended or leverages by multiple participating systems, if needed. +- Entities discovered via ENV variable can participate in Resource Manager generically, i.e. resolving conflicting definitions. + +The actual design for this ENV variable interaction would follow the approval of this OTEP. + +## Interactions with OpenTelemetry Collector + +The OpenTelemetry collector can be updated to optionally interact with Entity on Resource. A new entity-focused resource detection process can be created which allows add/override behavior at the entity level, rather than individual attribute level. + +For example, the existing resource detector looks like this: + +```yaml +processors: + resourcedetection/docker: + detectors: [env, docker] + timeout: 2s + override: false +``` + +The future entity-based detector would look almost exactly the same, but interact with the entity model of resource: + +```yaml +processor: + entityresourcedetection: + # Order determines override behavior + detectors: [env, docker] + # False means only append if entity doesn't already exist. + override: false +``` + +The list of detectors is given in priority order (first wins, in event of a tie, outside of override configuration). The processor may need to be updated to allow the override flag to apply to each individual detector. + +# Datamodel Changes + +Given our desired design and algorithms for detecting, merging and manipulating Entities, we need the ability to denote how entity and resource relate. These changes must not break existing usage of Resource, therefore: + +- The Entity model must be *layered on top of* the Resource model. A system does not need to ineract with entities for correct behavior. +- Existing key usage of Resource must remain when using Entities, specifically navigationality (see: [OpenTelemetry Resources: Principles and Characteristics](https://docs.google.com/document/d/1Xd1JP7eNhRpdz1RIBLeA1_4UYPRJaouloAYqldCeNSc/edit)) +- Downstream components should be able to engage with the Entity model in Resource. + +The following changes are made: + +## Resource + +| Field | Type | Description | Changes | +| ----- | ---- | ----------- | ------- | +| schema_url | string | The Schema URL, if known. This is the identifier of the Schema that the resource data is recorded in. This field is deprecated and should no longer be used. | Will be deprecated | +| dropped_attributes_count | integer | dropped_attributes_count is the number of dropped attributes. If the value is 0, then no attributes were dropped. | Unchanged | +| attributes | repeated KeyValue | Set of attributes that describe the resource.

Attribute keys MUST be unique (it is not allowed to have more than one attribute with the same key).| Unchanged | +| entities | repeated ResourceEntityRef | Set of entities that participate in this Resource. | Added | + +The DataModel would ensure that attributes in Resource are produced from both the identifying and descriptive attributes of Entity. This does not mean the protocol needs to transmit duplicate data, that design is TBD. + +## ResourceEntityRef + +The entityref data model, would have the following changes from the original [entity OTEP](https://github.com/open-telemetry/oteps/blob/main/text/entities/0256-entities-data-model.md) to denote references within Resource: + +| Field | Type | Description | Changes | +| ----- | ---- | ----------- | ------- | +| schema_url | string | The Schema URL, if known. This is the identifier of the Schema that the entity data is recorded in. To learn more about Schema URL see https://opentelemetry.io/docs/specs/otel/schemas/#schema-url | added | +| type | string | Defines the type of the entity. MUST not change during the lifetime of the entity. For example: "service" or "host". This field is required and MUST not be empty for valid entities. | unchanged | +| identifying_attributes_keys | repeated string | Attribute Keys that identify the entity.
MUST not change during the lifetime of the entity. The Id must contain at least one attribute.

These keys MUST exists in Resource.attributes.

Follows OpenTelemetry common attribute definition. SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | +| descriptive_attributes_keys | repeated string | Descriptive (non-identifying) attribute keys of the entity.
MAY change over the lifetime of the entity. MAY be empty. These attribute keys are not part of entity's identity.

These keys MUST exist in Resource.attributes.

Follows any value definition in the OpenTelemetry spec - it can be a scalar value, byte array, an array or map of values. Arbitrary deep nesting of values for arrays and maps is allowed.

SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | + +# How this proposal solves the problems that motivated it + +Let's look at some motivating problems from the [Entities Proposal](https://docs.google.com/document/d/1VUdBRInLEhO_0ABAoiLEssB1CQO_IcD5zDnaMEha42w/edit#heading=h.atg5m85uw9w8): + +**Problem 1: Commingling of Entities** + +We embrace the need for commingling entities in Resource and allow downstream users to interact with the individual entities rather than erasing these details. + +**Problem 2: Lack of Precise Identity** + +Identity is now clearly delineated from description via the Entity portion of Resource. When Entity is used for Resource, only identifying attributes need to be interacted with to create resource identity. + +**Problem 3: Lack of Mutable Attributes** + +This proposal offers two solutions going forward to this: + +- Descriptive attributes may be mutated without violating Resource identity +- Entities whose lifetimes do not match SDK may be attached/removed from Resource. + +**Problem 4: Metric Cardinality Problem** + +Via solution to (2) we can leverage an identity synthesized from identifying attributes on Entity. By directly modeling entity lifetimes, we guarantee that identity changes in Resource ONLY occur when source of telemetry changes. This solves unintended metric cardinality problems (while leaving those that are necessary to deal with, e.g. collecting metrics from phones or browser instances where intrinsic cardinality is high). + +## Entity WG Rubric + +The Entities WG came up with a rubric to evaluate solutions based on shared +beliefs and goals for the overall effort. Let's look at how each item is +achieved: + +**Resource detectors (soon to be entity detectors) need to be composable / disjoint** +Entity detection and Resource Manager now fulfill this need. + + +**New entities added by extension should not break existing code** +Users will need to configure a new Entity detector for new entities being modelled. + + +**Navigational attributes need to exist and can be used to identify an entity but could be augmented with UUID or other aspects. - Having ONLY a UUID for entity identification is not good enough.** +Resource will still be composed of identifying and descriptive attributes of Entity, allowing baseline navigational attributes users already expect from resource. + + +**Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets.** +Entity concept provides a new "bundle" mechanism to resource for the Collector to augment enrich a group of attributes and better identify conflicts (or identity changes) caused therein. + + +**Users are expected to provide / prioritize "detectors" and determine which entity is "producing" or most-important for a signal** +The Resource Manager allows users to configure priority of Entity Detectors. + +**For an SDK - ALL telemetry should be associated with the same set of entities (resource labels).** +Resource Manager is responsible for resolving entities into a cohesive Resource that meets the same demands as Resource today. + + +# Open Questions + +The following remain open questions: + +## How to attach Entity "bundle" information in Resource? + +The protocol today requires a raw grab bag of Attributes on Resource. We cannot break this going forward. However, Entities represent a new mechanism of "bundling" attributes on Resource and interacting with these bundles. We do not want this to bloat the protocol, nor do we want it to cause oddities. + +Going forward, we have set of options: + +- Duplicate attributes in `Entity` section of Resource. +- Reference attributes of Resource in entity. +- Only identify Entity id and keep attribute<->entity association out of band. +- Extend Attribute on Resource so that we can track the entity type per Key-Value (across any attribute in OTLP). + +The third option prevents generic code from interacting with Resource and Entity without understanding the model of each. The first keeps all usage of entity simple at the expense of duplicating information and the middle is awkward to interact with from an OTLP usage perspective. The fourth is violates our stability policy for OTLP. + +## How to deal with Resource/Entities whose lifecycle does not match the SDK? + +This proposal motivates a Resource Coordinator in the SDK whose job could include managing changes in entity lifetimes, but does not account for how these changes would be broadcast across TracerProvider, LogProvider, MeterProvider, etc. That would be addressed in a follow on OTEP. + +## How to deal with Prometheus Compatibility for non-SDK telemetry? + +Today, Prometheus compatibility relies on two key attributes in Resource: service.name and service.instance.id. These are not guaranteed to exist outside of OpenTelemetry SDK generation. While this question is not fully answered, we believe outlining identity in all resources within OpenTelemetry allows us to define a solution in the future while preserving compatibility with what works today. + +## Should entities have a domain? + +Is it worth having a `domain` in addition to type for entity? We could force each entity to exist in one domain and leverage domain generically in resource management. Entity Detectors would be responsible for an entire domain, selecting only ONE to apply a resource. Domains could be layered, e.g. a Cloud-specific domain may layer on top of a Kubernetes domain, where "GKE cluster entity" identifies *which* kubernetes cluster a kuberntes infra entity is part of. This layer would be done naively, via automatic join of participating entities or explicit relationships derived from GKE specific hooks. + +It's unclear if this is needed initially, and we believe this could be layered in later. + +## Should resources have only one associated entity? + +Given the problems leading to the Entities working group, and the needs of existing Resource users today, we think it is infeasible and unscalable to limit resource to only one entity. This would place restrictions on modeling Entities that would require OpenTelemetry to be the sole source of entity definitions and hurt building an open and extensible ecosystem. Additionally it would need careful definition of solutions for the following problems/rubrics: + +- New entities added by extension should not break existing code +- Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets. + +## What identity should entities use (LID, UUID / GUID, or other)? + +One of the largest questions in the first entities' OTEP was how to identify an entity. This was an attempt to unify the need for Navigational attributes with the notion that only identifying attributes of Entity would show up in Resource going forward. This restriction is no longer necessary in this proposal and we should reconsider how to model identity for an Entity. + +This can be done in follow up design / OTEPs. + +## What happens if existing Resource translation in the collector remove resource attributes an Entity relies on? + +While we expect the collector to be the first component to start engaging with Entities in an architecture, this could lead to data model violations. We have a few options to deal with this issue: + +- Consider this a bug and warn users not to do it. +- Specify that missing attribute keys are acceptable for descriptive attribtues. +- Specify that missing attribute keys denote that entities are unsuable for that batch of telemetry, and treat the content as malformed. + +# Trade-offs and mitigations + +The design proposed here attempts to balance non-breaking (backwards and forwards compatible) changes with the need to improve problematic issues in the Specification. Given the inability of most SDKs to implement the current Resource merge specification, breaking this should have little effect on actual users. Instead, the proposed merge specification should allow impelmentation to match current behavior and expectation, while evolving for users who engage with the new model. + +# Prior art and alternatives + +Previously, we have a few unaccepted oteps, e.g. ([OTEP 208](https://github.com/open-telemetry/oteps/pull/208)). Additionally, there are some alternatives that were considered in the Entities WG and rejected. + +Below is a brief discussion of some design decisions: + +- **Only associating one enttiy with a Resource.** This was rejected, as too high a friction point in evolving semantic conventions and allowing independent systems to coordinate identity + entities within the OpenTelemetry ecosystem. Eventually, this would force OpenTelemetry to model all possibly entities in the world and understand their interaction or otherwise prevent non-OpenTelemetry instrumentation from interacting with OpenTelemetry entities. +- **Embed fully Entity in Resource.** This was rejected because it makes it easy/trivial for Resource attributes and Entities to diverge. This would prevent the backwards/forwards compatibility goals and also require all participating OTLP users to leverage entities. Entity should be an opt-in / additional feature that may or may not be engaged with, depending on user need. +- **Re-using resource detection as-is** This was reject as not having a viable compatibility path forward. Creating a new set of components that can preserve existing behavior while allowing users to adopt the new functionality means that users have better control of when they see / change system behavior, and adoption is more obvious across the ecosystem. + +# Future Posibilities + +This proposal opens the door for addressing issues where an Entity's lifetime does not match an SDK's lifetime, in addition to providing a data model where mutable (descriptive) attributes can be changed over the lifetime of a resource without affecting its idnetity. We expect a follow-on OTEP which directly handles this issue. \ No newline at end of file From e3a3de04110ecd21eccf0472a59e1beadcd92bfe Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 13 Aug 2024 13:23:54 -0400 Subject: [PATCH 05/51] Remove accidental readdition of template file. --- text/entities/????-resource-and-entities.md | 276 -------------------- 1 file changed, 276 deletions(-) delete mode 100644 text/entities/????-resource-and-entities.md diff --git a/text/entities/????-resource-and-entities.md b/text/entities/????-resource-and-entities.md deleted file mode 100644 index 870e303d1..000000000 --- a/text/entities/????-resource-and-entities.md +++ /dev/null @@ -1,276 +0,0 @@ -# Resource and Entities - Data Model Part 2 - -This is a proposal to address Resource and Entity data model interactions, -including a path forward to address immediate friction and issues in the -current resource specification. - - - - - - -## Motivation - -This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: - -- Allowing mutating attributes to participate in Resource ([OTEP 208](https://github.com/open-telemetry/oteps/pull/208)). -- Allow Resource to handle entities whose lifetimes don't match the SDK's lifetime ([OTEP 208](https://github.com/open-telemetry/oteps/pull/208)). -- Provide support for async resource lookup ([spec#952](https://github.com/open-telemetry/opentelemetry-specification/issues/952)). -- Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](https://github.com/open-telemetry/oteps/pull/208), [spec#3382](https://github.com/open-telemetry/opentelemetry-specification/issues/3382), [spec#3710](https://github.com/open-telemetry/opentelemetry-specification/issues/3710)). -- Allow semantic convention resource modeling to progress ([spec#605](https://github.com/open-telemetry/opentelemetry-specification/issues/605), [spec#559](https://github.com/open-telemetry/opentelemetry-specification/issues/559), etc). - -# Approach - Resource Improvements -Start with outlining Entity detectors and Resource composition. This has a higher priority for fixing within OpenTelemetry, and needs to be unblocked sooner. Infer our way back to data model and Collector use cases. - -We define the following SDK components: - -- **Resource Detectors (legacy)**: We preserve existing resource detectors. They have the same behavior and interfaces as today. -- **Entity Detectors (new)**: Detecting an entity that is relevant to the current instance of the SDK. For example, this would detect a service entity for the current SDK, or its process. Every entity must have some relation to the current SDK. -- **Resource Coordinator (new)**: A component responsible for taking Resource and Entity detectors and doing the following: - - Constructing a Resource for the SDK from detectors. - - Dealing with conflicts between detectors. - - Providing SDK-internal access to detected Resources for reporting via Log signal on configured LogProviders. - - *(new) Managing Entity changes during SDK lifetime, specifically dealing with entities that have lifetimes shorter than the SDK* - -## Resource Container - -The SDK Resource coordinator is responsible for running all configured Resource and Entity Detectors. There will be some (user-controlled, otel default) priority order to these. - -- The resource coordinator will detect conflicts in Entity of the same type being discovered and choose one to use. -- When using Entity Detectors and Resource detectors together, the following merge rules will be used: - - Entity merging will occur first resulting in an "Entity Merged" Resource. - - Entities of different types will be merged into the resulting Resource. - - Entities of the same type will have one rejected and one accepted, based on priority. - - Resource detectors otherwise follow existing merge semantics. - - The Specification merge rules will be updated to account for violations prevalent in ALL implementation of resource detection. - - Specifically: This means the rules around merging Resource across schema-url will be dropped. Instead only conflicting attributes will be dropped. - - SchemaURL on Resource will need to be deprecated with entity-specific schema-url replacing it. Additionally, as no Resource semantic conventions have ever stabilized, SchemaURL usage on Resource cannot be in stable components of OpenTelemetry. Given prevalent violation of implementations around Resource merge specification, we suspect impact of this deprecation to be minimal. - - An OOTB "Env Variable Entity Detector" will be specified and provided vs. requiring SDK wide ENV variables for resource detection. -- *Additionally, Resource Coordinator would be responsible for understanding Entity lifecycle events, for Entities whose lifetimes do not match or exceed the SDK's own lifetime (e.g. browser session).* - -## Entity Detector - -The Entity detector in the SDK is responsible for detecting possible entities that could identify the SDK. For Example, if the SDK is running in a kubernetes pod, it may provide an Entity for that pod. SDK Entity Detectors are only required to provide identifying attributes, but may provide descriptive attributes to ensure combined Resource contains similar attributes as today's SDK. - -An Entity Detector would have an API similar to: - -```rust -trait EntityDetector - pub fn detect_entities(...) -> Result, EntityDetectionError>> -``` - -Where `Result` is the equivalent of error channel in the language of choice (e.g. in Go this would be `entities, err := e.detectEntities()`). - -## Entity Merging and Resource - -The most important aspect of this design is how Entities will be merged to construct a Resource. We provide a simple algorithm for this behavior: - -- Construct a set of detected entities, E -- All entity detectors are sorted by priority -- For each entity detector - - For each entity detected - - If the entity exists in E, ignore it - - Otherwise, add the entity to E -- Construct a Resource from the set E. - -Any implementation that achieves the same result as this algorithm is acceptable. - -## Environment Variable Detector - -An Entity detector will be specified to allow Platform to inject entity identity information into workloads running on that platform. For Example, the OpenTelemetry Operator could inject information about Kubernetes Deployment + Container into the environment, which SDKs can elect to interact with (through configuration of the Environment Variable Entity Detector). - -While details of ENV variables will be subject to change, it would look something like the following: - -```bash -set OTEL_DETECTED_ENTITIES=k8s.deployment[k8s.deployment.name=my-program],k8s.pod[k8s.pod.name=my-program-2314,k8s.namespace=default] - -``` - -The minimum requirements of this entity detector are: - -- ENV variable can specify multiple entities (resource attribute bundles) -- ENV variable can be easily appended or leverages by multiple participating systems, if needed. -- Entities discovered via ENV variable can participate in Resource Manager generically, i.e. resolving conflicting definitions. - -The actual design for this ENV variable interaction would follow the approval of this OTEP. - -## Interactions with OpenTelemetry Collector - -The OpenTelemetry collector can be updated to optionally interact with Entity on Resource. A new entity-focused resource detection process can be created which allows add/override behavior at the entity level, rather than individual attribute level. - -For example, the existing resource detector looks like this: - -```yaml -processors: - resourcedetection/docker: - detectors: [env, docker] - timeout: 2s - override: false -``` - -The future entity-based detector would look almost exactly the same, but interact with the entity model of resource: - -```yaml -processor: - entityresourcedetection: - # Order determines override behavior - detectors: [env, docker] - # False means only append if entity doesn't already exist. - override: false -``` - -The list of detectors is given in priority order (first wins, in event of a tie, outside of override configuration). The processor may need to be updated to allow the override flag to apply to each individual detector. - -# Datamodel Changes - -Given our desired design and algorithms for detecting, merging and manipulating Entities, we need the ability to denote how entity and resource relate. These changes must not break existing usage of Resource, therefore: - -- The Entity model must be *layered on top of* the Resource model. A system does not need to ineract with entities for correct behavior. -- Existing key usage of Resource must remain when using Entities, specifically navigationality (see: [OpenTelemetry Resources: Principles and Characteristics](https://docs.google.com/document/d/1Xd1JP7eNhRpdz1RIBLeA1_4UYPRJaouloAYqldCeNSc/edit)) -- Downstream components should be able to engage with the Entity model in Resource. - -The following changes are made: - -## Resource - -| Field | Type | Description | Changes | -| ----- | ---- | ----------- | ------- | -| schema_url | string | The Schema URL, if known. This is the identifier of the Schema that the resource data is recorded in. This field is deprecated and should no longer be used. | Will be deprecated | -| dropped_attributes_count | integer | dropped_attributes_count is the number of dropped attributes. If the value is 0, then no attributes were dropped. | Unchanged | -| attributes | repeated KeyValue | Set of attributes that describe the resource.

Attribute keys MUST be unique (it is not allowed to have more than one attribute with the same key).| Unchanged | -| entities | repeated ResourceEntityRef | Set of entities that participate in this Resource. | Added | - -The DataModel would ensure that attributes in Resource are produced from both the identifying and descriptive attributes of Entity. This does not mean the protocol needs to transmit duplicate data, that design is TBD. - -## ResourceEntityRef - -The entityref data model, would have the following changes from the original [entity OTEP](https://github.com/open-telemetry/oteps/blob/main/text/entities/0256-entities-data-model.md) to denote references within Resource: - -| Field | Type | Description | Changes | -| ----- | ---- | ----------- | ------- | -| schema_url | string | The Schema URL, if known. This is the identifier of the Schema that the entity data is recorded in. To learn more about Schema URL see https://opentelemetry.io/docs/specs/otel/schemas/#schema-url | added | -| type | string | Defines the type of the entity. MUST not change during the lifetime of the entity. For example: "service" or "host". This field is required and MUST not be empty for valid entities. | unchanged | -| identifying_attributes_keys | repeated string | Attribute Keys that identify the entity.
MUST not change during the lifetime of the entity. The Id must contain at least one attribute.

These keys MUST exists in Resource.attributes.

Follows OpenTelemetry common attribute definition. SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | -| descriptive_attributes_keys | repeated string | Descriptive (non-identifying) attribute keys of the entity.
MAY change over the lifetime of the entity. MAY be empty. These attribute keys are not part of entity's identity.

These keys MUST exist in Resource.attributes.

Follows any value definition in the OpenTelemetry spec - it can be a scalar value, byte array, an array or map of values. Arbitrary deep nesting of values for arrays and maps is allowed.

SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | - -# How this proposal solves the problems that motivated it - -Let's look at some motivating problems from the [Entities Proposal](https://docs.google.com/document/d/1VUdBRInLEhO_0ABAoiLEssB1CQO_IcD5zDnaMEha42w/edit#heading=h.atg5m85uw9w8): - -**Problem 1: Commingling of Entities** - -We embrace the need for commingling entities in Resource and allow downstream users to interact with the individual entities rather than erasing these details. - -**Problem 2: Lack of Precise Identity** - -Identity is now clearly delineated from description via the Entity portion of Resource. When Entity is used for Resource, only identifying attributes need to be interacted with to create resource identity. - -**Problem 3: Lack of Mutable Attributes** - -This proposal offers two solutions going forward to this: - -- Descriptive attributes may be mutated without violating Resource identity -- Entities whose lifetimes do not match SDK may be attached/removed from Resource. - -**Problem 4: Metric Cardinality Problem** - -Via solution to (2) we can leverage an identity synthesized from identifying attributes on Entity. By directly modeling entity lifetimes, we guarantee that identity changes in Resource ONLY occur when source of telemetry changes. This solves unintended metric cardinality problems (while leaving those that are necessary to deal with, e.g. collecting metrics from phones or browser instances where intrinsic cardinality is high). - -## Entity WG Rubric - -The Entities WG came up with a rubric to evaluate solutions based on shared -beliefs and goals for the overall effort. Let's look at how each item is -achieved: - -**Resource detectors (soon to be entity detectors) need to be composable / disjoint** -Entity detection and Resource Manager now fulfill this need. - - -**New entities added by extension should not break existing code** -Users will need to configure a new Entity detector for new entities being modelled. - - -**Navigational attributes need to exist and can be used to identify an entity but could be augmented with UUID or other aspects. - Having ONLY a UUID for entity identification is not good enough.** -Resource will still be composed of identifying and descriptive attributes of Entity, allowing baseline navigational attributes users already expect from resource. - - -**Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets.** -Entity concept provides a new "bundle" mechanism to resource for the Collector to augment enrich a group of attributes and better identify conflicts (or identity changes) caused therein. - - -**Users are expected to provide / prioritize "detectors" and determine which entity is "producing" or most-important for a signal** -The Resource Manager allows users to configure priority of Entity Detectors. - -**For an SDK - ALL telemetry should be associated with the same set of entities (resource labels).** -Resource Manager is responsible for resolving entities into a cohesive Resource that meets the same demands as Resource today. - - -# Open Questions - -The following remain open questions: - -## How to attach Entity "bundle" information in Resource? - -The protocol today requires a raw grab bag of Attributes on Resource. We cannot break this going forward. However, Entities represent a new mechanism of "bundling" attributes on Resource and interacting with these bundles. We do not want this to bloat the protocol, nor do we want it to cause oddities. - -Going forward, we have set of options: - -- Duplicate attributes in `Entity` section of Resource. -- Reference attributes of Resource in entity. -- Only identify Entity id and keep attribute<->entity association out of band. -- Extend Attribute on Resource so that we can track the entity type per Key-Value (across any attribute in OTLP). - -The third option prevents generic code from interacting with Resource and Entity without understanding the model of each. The first keeps all usage of entity simple at the expense of duplicating information and the middle is awkward to interact with from an OTLP usage perspective. The fourth is violates our stability policy for OTLP. - -## How to deal with Resource/Entities whose lifecycle does not match the SDK? - -This proposal motivates a Resource Coordinator in the SDK whose job could include managing changes in entity lifetimes, but does not account for how these changes would be broadcast across TracerProvider, LogProvider, MeterProvider, etc. That would be addressed in a follow on OTEP. - -## How to deal with Prometheus Compatibility for non-SDK telemetry? - -Today, Prometheus compatibility relies on two key attributes in Resource: service.name and service.instance.id. These are not guaranteed to exist outside of OpenTelemetry SDK generation. While this question is not fully answered, we believe outlining identity in all resources within OpenTelemetry allows us to define a solution in the future while preserving compatibility with what works today. - -## Should entities have a domain? - -Is it worth having a `domain` in addition to type for entity? We could force each entity to exist in one domain and leverage domain generically in resource management. Entity Detectors would be responsible for an entire domain, selecting only ONE to apply a resource. Domains could be layered, e.g. a Cloud-specific domain may layer on top of a Kubernetes domain, where "GKE cluster entity" identifies *which* kubernetes cluster a kuberntes infra entity is part of. This layer would be done naively, via automatic join of participating entities or explicit relationships derived from GKE specific hooks. - -It's unclear if this is needed initially, and we believe this could be layered in later. - -## Should resources have only one associated entity? - -Given the problems leading to the Entities working group, and the needs of existing Resource users today, we think it is infeasible and unscalable to limit resource to only one entity. This would place restrictions on modeling Entities that would require OpenTelemetry to be the sole source of entity definitions and hurt building an open and extensible ecosystem. Additionally it would need careful definition of solutions for the following problems/rubrics: - -- New entities added by extension should not break existing code -- Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets. - -## What identity should entities use (LID, UUID / GUID, or other)? - -One of the largest questions in the first entities' OTEP was how to identify an entity. This was an attempt to unify the need for Navigational attributes with the notion that only identifying attributes of Entity would show up in Resource going forward. This restriction is no longer necessary in this proposal and we should reconsider how to model identity for an Entity. - -This can be done in follow up design / OTEPs. - -## What happens if existing Resource translation in the collector remove resource attributes an Entity relies on? - -While we expect the collector to be the first component to start engaging with Entities in an architecture, this could lead to data model violations. We have a few options to deal with this issue: - -- Consider this a bug and warn users not to do it. -- Specify that missing attribute keys are acceptable for descriptive attribtues. -- Specify that missing attribute keys denote that entities are unsuable for that batch of telemetry, and treat the content as malformed. - -# Trade-offs and mitigations - -The design proposed here attempts to balance non-breaking (backwards and forwards compatible) changes with the need to improve problematic issues in the Specification. Given the inability of most SDKs to implement the current Resource merge specification, breaking this should have little effect on actual users. Instead, the proposed merge specification should allow impelmentation to match current behavior and expectation, while evolving for users who engage with the new model. - -# Prior art and alternatives - -Previously, we have a few unaccepted oteps, e.g. ([OTEP 208](https://github.com/open-telemetry/oteps/pull/208)). Additionally, there are some alternatives that were considered in the Entities WG and rejected. - -Below is a brief discussion of some design decisions: - -- **Only associating one enttiy with a Resource.** This was rejected, as too high a friction point in evolving semantic conventions and allowing independent systems to coordinate identity + entities within the OpenTelemetry ecosystem. Eventually, this would force OpenTelemetry to model all possibly entities in the world and understand their interaction or otherwise prevent non-OpenTelemetry instrumentation from interacting with OpenTelemetry entities. -- **Embed fully Entity in Resource.** This was rejected because it makes it easy/trivial for Resource attributes and Entities to diverge. This would prevent the backwards/forwards compatibility goals and also require all participating OTLP users to leverage entities. Entity should be an opt-in / additional feature that may or may not be engaged with, depending on user need. -- **Re-using resource detection as-is** This was reject as not having a viable compatibility path forward. Creating a new set of components that can preserve existing behavior while allowing users to adopt the new functionality means that users have better control of when they see / change system behavior, and adoption is more obvious across the ecosystem. - -# Future Posibilities - -This proposal opens the door for addressing issues where an Entity's lifetime does not match an SDK's lifetime, in addition to providing a data model where mutable (descriptive) attributes can be changed over the lifetime of a resource without affecting its idnetity. We expect a follow-on OTEP which directly handles this issue. \ No newline at end of file From 64c6b1544df41ad0d777747a5e21fabed9d3e008 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 13 Aug 2024 18:12:45 -0400 Subject: [PATCH 06/51] Some lint fixes. --- text/entities/0264-resource-and-entities.md | 48 +++++++++++---------- 1 file changed, 25 insertions(+), 23 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 9ee408094..a06964554 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -8,7 +8,6 @@ current resource specification. - ## Motivation This proposal attempts to focus on the following problems within OpenTelemetry to unblock multiple working groups: @@ -19,8 +18,11 @@ This proposal attempts to focus on the following problems within OpenTelemetry t - Fix current Resource merge rules in the specification, which most implementations violate ([oteps#208](https://github.com/open-telemetry/oteps/pull/208), [spec#3382](https://github.com/open-telemetry/opentelemetry-specification/issues/3382), [spec#3710](https://github.com/open-telemetry/opentelemetry-specification/issues/3710)). - Allow semantic convention resource modeling to progress ([spec#605](https://github.com/open-telemetry/opentelemetry-specification/issues/605), [spec#559](https://github.com/open-telemetry/opentelemetry-specification/issues/559), etc). -# Approach - Resource Improvements -Start with outlining Entity detectors and Resource composition. This has a higher priority for fixing within OpenTelemetry, and needs to be unblocked sooner. Infer our way back to data model and Collector use cases. +## Design + +### Approach - Resource Improvements + +Let's focus on outlining Entity detectors and Resource composition. This has a higher priority for fixing within OpenTelemetry, and needs to be unblocked sooner. Then infer our way back to data model and Collector use cases. We define the following SDK components: @@ -32,7 +34,7 @@ We define the following SDK components: - Providing SDK-internal access to detected Resources for reporting via Log signal on configured LogProviders. - *(new) Managing Entity changes during SDK lifetime, specifically dealing with entities that have lifetimes shorter than the SDK* -## Resource Container +#### Resource Container The SDK Resource coordinator is responsible for running all configured Resource and Entity Detectors. There will be some (user-controlled, otel default) priority order to these. @@ -48,7 +50,7 @@ The SDK Resource coordinator is responsible for running all configured Resource - An OOTB "Env Variable Entity Detector" will be specified and provided vs. requiring SDK wide ENV variables for resource detection. - *Additionally, Resource Coordinator would be responsible for understanding Entity lifecycle events, for Entities whose lifetimes do not match or exceed the SDK's own lifetime (e.g. browser session).* -## Entity Detector +#### Entity Detector The Entity detector in the SDK is responsible for detecting possible entities that could identify the SDK. For Example, if the SDK is running in a kubernetes pod, it may provide an Entity for that pod. SDK Entity Detectors are only required to provide identifying attributes, but may provide descriptive attributes to ensure combined Resource contains similar attributes as today's SDK. @@ -61,7 +63,7 @@ trait EntityDetector Where `Result` is the equivalent of error channel in the language of choice (e.g. in Go this would be `entities, err := e.detectEntities()`). -## Entity Merging and Resource +#### Entity Merging and Resource The most important aspect of this design is how Entities will be merged to construct a Resource. We provide a simple algorithm for this behavior: @@ -75,7 +77,7 @@ The most important aspect of this design is how Entities will be merged to const Any implementation that achieves the same result as this algorithm is acceptable. -## Environment Variable Detector +#### Environment Variable Detector An Entity detector will be specified to allow Platform to inject entity identity information into workloads running on that platform. For Example, the OpenTelemetry Operator could inject information about Kubernetes Deployment + Container into the environment, which SDKs can elect to interact with (through configuration of the Environment Variable Entity Detector). @@ -94,7 +96,7 @@ The minimum requirements of this entity detector are: The actual design for this ENV variable interaction would follow the approval of this OTEP. -## Interactions with OpenTelemetry Collector +### Interactions with OpenTelemetry Collector The OpenTelemetry collector can be updated to optionally interact with Entity on Resource. A new entity-focused resource detection process can be created which allows add/override behavior at the entity level, rather than individual attribute level. @@ -121,7 +123,7 @@ processor: The list of detectors is given in priority order (first wins, in event of a tie, outside of override configuration). The processor may need to be updated to allow the override flag to apply to each individual detector. -# Datamodel Changes +## Datamodel Changes Given our desired design and algorithms for detecting, merging and manipulating Entities, we need the ability to denote how entity and resource relate. These changes must not break existing usage of Resource, therefore: @@ -131,7 +133,7 @@ Given our desired design and algorithms for detecting, merging and manipulating The following changes are made: -## Resource +### Resource | Field | Type | Description | Changes | | ----- | ---- | ----------- | ------- | @@ -142,7 +144,7 @@ The following changes are made: The DataModel would ensure that attributes in Resource are produced from both the identifying and descriptive attributes of Entity. This does not mean the protocol needs to transmit duplicate data, that design is TBD. -## ResourceEntityRef +### ResourceEntityRef The entityref data model, would have the following changes from the original [entity OTEP](https://github.com/open-telemetry/oteps/blob/main/text/entities/0256-entities-data-model.md) to denote references within Resource: @@ -153,7 +155,7 @@ The entityref data model, would have the following changes from the original [en | identifying_attributes_keys | repeated string | Attribute Keys that identify the entity.
MUST not change during the lifetime of the entity. The Id must contain at least one attribute.

These keys MUST exists in Resource.attributes.

Follows OpenTelemetry common attribute definition. SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | | descriptive_attributes_keys | repeated string | Descriptive (non-identifying) attribute keys of the entity.
MAY change over the lifetime of the entity. MAY be empty. These attribute keys are not part of entity's identity.

These keys MUST exist in Resource.attributes.

Follows any value definition in the OpenTelemetry spec - it can be a scalar value, byte array, an array or map of values. Arbitrary deep nesting of values for arrays and maps is allowed.

SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | -# How this proposal solves the problems that motivated it +## How this proposal solves the problems that motivated it Let's look at some motivating problems from the [Entities Proposal](https://docs.google.com/document/d/1VUdBRInLEhO_0ABAoiLEssB1CQO_IcD5zDnaMEha42w/edit#heading=h.atg5m85uw9w8): @@ -205,11 +207,11 @@ The Resource Manager allows users to configure priority of Entity Detectors. Resource Manager is responsible for resolving entities into a cohesive Resource that meets the same demands as Resource today. -# Open Questions +## Open Questions The following remain open questions: -## How to attach Entity "bundle" information in Resource? +### How to attach Entity "bundle" information in Resource? The protocol today requires a raw grab bag of Attributes on Resource. We cannot break this going forward. However, Entities represent a new mechanism of "bundling" attributes on Resource and interacting with these bundles. We do not want this to bloat the protocol, nor do we want it to cause oddities. @@ -222,34 +224,34 @@ Going forward, we have set of options: The third option prevents generic code from interacting with Resource and Entity without understanding the model of each. The first keeps all usage of entity simple at the expense of duplicating information and the middle is awkward to interact with from an OTLP usage perspective. The fourth is violates our stability policy for OTLP. -## How to deal with Resource/Entities whose lifecycle does not match the SDK? +### How to deal with Resource/Entities whose lifecycle does not match the SDK? This proposal motivates a Resource Coordinator in the SDK whose job could include managing changes in entity lifetimes, but does not account for how these changes would be broadcast across TracerProvider, LogProvider, MeterProvider, etc. That would be addressed in a follow on OTEP. -## How to deal with Prometheus Compatibility for non-SDK telemetry? +### How to deal with Prometheus Compatibility for non-SDK telemetry? Today, Prometheus compatibility relies on two key attributes in Resource: service.name and service.instance.id. These are not guaranteed to exist outside of OpenTelemetry SDK generation. While this question is not fully answered, we believe outlining identity in all resources within OpenTelemetry allows us to define a solution in the future while preserving compatibility with what works today. -## Should entities have a domain? +### Should entities have a domain? Is it worth having a `domain` in addition to type for entity? We could force each entity to exist in one domain and leverage domain generically in resource management. Entity Detectors would be responsible for an entire domain, selecting only ONE to apply a resource. Domains could be layered, e.g. a Cloud-specific domain may layer on top of a Kubernetes domain, where "GKE cluster entity" identifies *which* kubernetes cluster a kuberntes infra entity is part of. This layer would be done naively, via automatic join of participating entities or explicit relationships derived from GKE specific hooks. It's unclear if this is needed initially, and we believe this could be layered in later. -## Should resources have only one associated entity? +### Should resources have only one associated entity? Given the problems leading to the Entities working group, and the needs of existing Resource users today, we think it is infeasible and unscalable to limit resource to only one entity. This would place restrictions on modeling Entities that would require OpenTelemetry to be the sole source of entity definitions and hurt building an open and extensible ecosystem. Additionally it would need careful definition of solutions for the following problems/rubrics: - New entities added by extension should not break existing code - Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets. -## What identity should entities use (LID, UUID / GUID, or other)? +### What identity should entities use (LID, UUID / GUID, or other)? One of the largest questions in the first entities' OTEP was how to identify an entity. This was an attempt to unify the need for Navigational attributes with the notion that only identifying attributes of Entity would show up in Resource going forward. This restriction is no longer necessary in this proposal and we should reconsider how to model identity for an Entity. This can be done in follow up design / OTEPs. -## What happens if existing Resource translation in the collector remove resource attributes an Entity relies on? +### What happens if existing Resource translation in the collector remove resource attributes an Entity relies on? While we expect the collector to be the first component to start engaging with Entities in an architecture, this could lead to data model violations. We have a few options to deal with this issue: @@ -257,11 +259,11 @@ While we expect the collector to be the first component to start engaging with E - Specify that missing attribute keys are acceptable for descriptive attribtues. - Specify that missing attribute keys denote that entities are unusable for that batch of telemetry, and treat the content as malformed. -# Trade-offs and mitigations +## Trade-offs and mitigations The design proposed here attempts to balance non-breaking (backwards and forwards compatible) changes with the need to improve problematic issues in the Specification. Given the inability of most SDKs to implement the current Resource merge specification, breaking this should have little effect on actual users. Instead, the proposed merge specification should allow implementations to match current behavior and expectation, while evolving for users who engage with the new model. -# Prior art and alternatives +## Prior art and alternatives Previously, we have a few unaccepted oteps, e.g. ([OTEP 208](https://github.com/open-telemetry/oteps/pull/208)). Additionally, there are some alternatives that were considered in the Entities WG and rejected. @@ -271,6 +273,6 @@ Below is a brief discussion of some design decisions: - **Embed fully Entity in Resource.** This was rejected because it makes it easy/trivial for Resource attributes and Entities to diverge. This would prevent the backwards/forwards compatibility goals and also require all participating OTLP users to leverage entities. Entity should be an opt-in / additional feature that may or may not be engaged with, depending on user need. - **Re-using resource detectoin as-is** This was reject as not having a viable compatibility path forward. Creating a new set of components that can preserve existing behavior while allowing users to adopt the new functionality means that users have better control of when they see / change system behavior, and adoption is more obvious across the ecosystem. -# Future Posibilities +## Future Posibilities This proposal opens the door for addressing issues where an Entity's lifetime does not match an SDK's lifetime, in addition to providing a data model where mutable (descriptive) attributes can be changed over the lifetime of a resource without affecting its idnetity. We expect a follow-on OTEP which directly handles this issue. \ No newline at end of file From 2057fa44b342843ac34055b96f1b12e027fe4b6a Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 13 Aug 2024 18:14:25 -0400 Subject: [PATCH 07/51] More lint fixes. --- text/entities/0264-resource-and-entities.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index a06964554..bf1a974c3 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -184,28 +184,29 @@ The Entities WG came up with a rubric to evaluate solutions based on shared beliefs and goals for the overall effort. Let's look at how each item is achieved: -**Resource detectors (soon to be entity detectors) need to be composable / disjoint** +### Resource detectors (soon to be entity detectors) need to be composable / disjoint + Entity detection and Resource Manager now fulfill this need. +### New entities added by extension should not break existing code -**New entities added by extension should not break existing code** Users will need to configure a new Entity detector for new entities being modelled. +### Navigational attributes need to exist and can be used to identify an entity but could be augmented with UUID or other aspects. - Having ONLY a UUID for entity identification is not good enough. -**Navigational attributes need to exist and can be used to identify an entity but could be augmented with UUID or other aspects. - Having ONLY a UUID for entity identification is not good enough.** Resource will still be composed of identifying and descriptive attributes of Entity, allowing baseline navigational attributes users already expect from resource. +### Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets. -**Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets.** Entity concept provides a new "bundle" mechanism to resource for the Collector to augment enrich a group of attributes and better identify conflicts (or identity changes) caused therein. +### Users are expected to provide / prioritize "detectors" and determine which entity is "producing" or most-important for a signal -**Users are expected to provide / prioritize "detectors" and determine which entity is "producing" or most-important for a signal** The Resource Manager allows users to configure priority of Entity Detectors. -**For an SDK - ALL telemetry should be associated with the same set of entities (resource labels).** -Resource Manager is responsible for resolving entities into a cohesive Resource that meets the same demands as Resource today. +### For an SDK - ALL telemetry should be associated with the same set of entities (resource labels). +Resource Manager is responsible for resolving entities into a cohesive Resource that meets the same demands as Resource today. ## Open Questions @@ -275,4 +276,4 @@ Below is a brief discussion of some design decisions: ## Future Posibilities -This proposal opens the door for addressing issues where an Entity's lifetime does not match an SDK's lifetime, in addition to providing a data model where mutable (descriptive) attributes can be changed over the lifetime of a resource without affecting its idnetity. We expect a follow-on OTEP which directly handles this issue. \ No newline at end of file +This proposal opens the door for addressing issues where an Entity's lifetime does not match an SDK's lifetime, in addition to providing a data model where mutable (descriptive) attributes can be changed over the lifetime of a resource without affecting its idnetity. We expect a follow-on OTEP which directly handles this issue. From 7dd3c5dc3deab8f3f5ed8d7a7a8580f087de8d39 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 13 Aug 2024 18:18:05 -0400 Subject: [PATCH 08/51] Fix more lint. --- text/entities/0264-resource-and-entities.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index bf1a974c3..4fb859378 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -46,7 +46,7 @@ The SDK Resource coordinator is responsible for running all configured Resource - Resource detectors otherwise follow existing merge semantics. - The Specification merge rules will be updated to account for violations prevalent in ALL implementation of resource detection. - Specifically: This means the rules around merging Resource across schema-url will be dropped. Instead only conflicting attributes will be dropped. - - SchemaURL on Resource will need to be deprecated with entity-specific schema-url replacing it. Additionally, as no Resource semantic conventions have ever stabilized, SchemaURL usage on Resource cannot be in stable components of OpenTelemetry. Given prevalent violation of implementations around Resource merge specification, we suspect impact of this deprecation to be minimal. + - SchemaURL on Resource will need to be deprecated with entity-specific schema-url replacing it. Additionally, as no Resource semantic conventions have ever stabilized, SchemaURL usage on Resource cannot be in stable components of OpenTelemetry. Given prevalent violation of implementations around Resource merge specification, we suspect impact of this deprecation to be minimal. - An OOTB "Env Variable Entity Detector" will be specified and provided vs. requiring SDK wide ENV variables for resource detection. - *Additionally, Resource Coordinator would be responsible for understanding Entity lifecycle events, for Entities whose lifetimes do not match or exceed the SDK's own lifetime (e.g. browser session).* @@ -150,7 +150,7 @@ The entityref data model, would have the following changes from the original [en | Field | Type | Description | Changes | | ----- | ---- | ----------- | ------- | -| schema_url | string | The Schema URL, if known. This is the identifier of the Schema that the entity data is recorded in. To learn more about Schema URL see https://opentelemetry.io/docs/specs/otel/schemas/#schema-url | added | +| schema_url | string | The Schema URL, if known. This is the identifier of the Schema that the entity data is recorded in. To learn more about Schema URL ([see docs](https://opentelemetry.io/docs/specs/otel/schemas/#schema-url)) | added | | type | string | Defines the type of the entity. MUST not change during the lifetime of the entity. For example: "service" or "host". This field is required and MUST not be empty for valid entities. | unchanged | | identifying_attributes_keys | repeated string | Attribute Keys that identify the entity.
MUST not change during the lifetime of the entity. The Id must contain at least one attribute.

These keys MUST exists in Resource.attributes.

Follows OpenTelemetry common attribute definition. SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | | descriptive_attributes_keys | repeated string | Descriptive (non-identifying) attribute keys of the entity.
MAY change over the lifetime of the entity. MAY be empty. These attribute keys are not part of entity's identity.

These keys MUST exist in Resource.attributes.

Follows any value definition in the OpenTelemetry spec - it can be a scalar value, byte array, an array or map of values. Arbitrary deep nesting of values for arrays and maps is allowed.

SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | @@ -159,22 +159,22 @@ The entityref data model, would have the following changes from the original [en Let's look at some motivating problems from the [Entities Proposal](https://docs.google.com/document/d/1VUdBRInLEhO_0ABAoiLEssB1CQO_IcD5zDnaMEha42w/edit#heading=h.atg5m85uw9w8): -**Problem 1: Commingling of Entities** +### Problem 1: Commingling of Entities We embrace the need for commingling entities in Resource and allow downstream users to interact with the individual entities rather than erasing these details. -**Problem 2: Lack of Precise Identity** +### Problem 2: Lack of Precise Identity Identity is now clearly delineated from description via the Entity portion of Resource. When Entity is used for Resource, only identifying attributes need to be interacted with to create resource identity. -**Problem 3: Lack of Mutable Attributes** +### Problem 3: Lack of Mutable Attributes This proposal offers two solutions going forward to this: - Descriptive attributes may be mutated without violating Resource identity - Entities whose lifetimes do not match SDK may be attached/removed from Resource. -**Problem 4: Metric Cardinality Problem** +### Problem 4: Metric Cardinality Problem Via solution to (2) we can leverage an identity synthesized from identifying attributes on Entity. By directly modeling entity lifetimes, we guarantee that identity changes in Resource ONLY occur when source of telemetry changes. This solves unintended metric cardinality problems (while leaving those that are necessary to deal with, e.g. collecting metrics from phones or browser instances where intrinsic cardinality is high). @@ -192,11 +192,11 @@ Entity detection and Resource Manager now fulfill this need. Users will need to configure a new Entity detector for new entities being modelled. -### Navigational attributes need to exist and can be used to identify an entity but could be augmented with UUID or other aspects. - Having ONLY a UUID for entity identification is not good enough. +### Navigational attributes need to exist and can be used to identify an entity but could be augmented with UUID or other aspects. - Having ONLY a UUID for entity identification is not good enough Resource will still be composed of identifying and descriptive attributes of Entity, allowing baseline navigational attributes users already expect from resource. -### Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets. +### Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets Entity concept provides a new "bundle" mechanism to resource for the Collector to augment enrich a group of attributes and better identify conflicts (or identity changes) caused therein. @@ -204,7 +204,7 @@ Entity concept provides a new "bundle" mechanism to resource for the Collector t The Resource Manager allows users to configure priority of Entity Detectors. -### For an SDK - ALL telemetry should be associated with the same set of entities (resource labels). +### For an SDK - ALL telemetry should be associated with the same set of entities (resource labels) Resource Manager is responsible for resolving entities into a cohesive Resource that meets the same demands as Resource today. From 76df4d0b4af4630ca6bb87c20b96521fc90b40c5 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 14 Aug 2024 08:28:30 -0400 Subject: [PATCH 09/51] Fix spellcheck issue. --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 4fb859378..78ba353d3 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -34,7 +34,7 @@ We define the following SDK components: - Providing SDK-internal access to detected Resources for reporting via Log signal on configured LogProviders. - *(new) Managing Entity changes during SDK lifetime, specifically dealing with entities that have lifetimes shorter than the SDK* -#### Resource Container +#### Resource Coordinator The SDK Resource coordinator is responsible for running all configured Resource and Entity Detectors. There will be some (user-controlled, otel default) priority order to these. From 38c237fe55aee49b84b07549a185b3c56038e277 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 14 Aug 2024 09:41:25 -0400 Subject: [PATCH 10/51] Outline prometheus path forward. --- text/entities/0264-resource-and-entities.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 78ba353d3..c1c4c58d8 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -231,7 +231,12 @@ This proposal motivates a Resource Coordinator in the SDK whose job could includ ### How to deal with Prometheus Compatibility for non-SDK telemetry? -Today, Prometheus compatibility relies on two key attributes in Resource: service.name and service.instance.id. These are not guaranteed to exist outside of OpenTelemetry SDK generation. While this question is not fully answered, we believe outlining identity in all resources within OpenTelemetry allows us to define a solution in the future while preserving compatibility with what works today. +Today, Prometheus compatibility relies on two key attributes in Resource: `service.name` and `service.instance.id`. These are not guaranteed to exist outside of OpenTelemetry SDK generation. While this question is not fully answered, we believe outlining identity in all resources within OpenTelemetry allows us to define a solution in the future while preserving compatibility with what works today. + +A quick proposal of what this might look like: + +- `target_info` metric generation is updated to exclude any keys which are contained in `descriptive_attributes_keys` of an entity. +- For each entity which has non-empty descriptive_attributes_keys, generate an info metric: `_entity_info` (naming TBD), which has all identifying and descriptive keys. This should play nicely with the planned improvements to [info-typed metrics](https://github.com/prometheus/proposals/blob/main/proposals/2024-04-10-native-support-for-info-metrics-metadata.md#goals). ### Should entities have a domain? From 78e3b4142e2d0bb43cf893711764da98a6d64168 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 14 Aug 2024 09:45:11 -0400 Subject: [PATCH 11/51] Fix spelling. --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index c1c4c58d8..30a1e6a11 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -275,7 +275,7 @@ Previously, we have a few unaccepted oteps, e.g. ([OTEP 208](https://github.com/ Below is a brief discussion of some design decisions: -- **Only associating one enttiy with a Resource.** This was rejected, as too high a friction point in evolving semantic conventions and allowing independent systems to coordinate identity + entities within the OpenTelemetry ecosystem. Eventually, this would force OpenTelemetry to model all possibly entities in the world and understand their interaction or otherwise prevent non-OpenTelemetry instrumentation from interacting with OpenTelemetry entities. +- **Only associating one entity with a Resource.** This was rejected, as too high a friction point in evolving semantic conventions and allowing independent systems to coordinate identity + entities within the OpenTelemetry ecosystem. Eventually, this would force OpenTelemetry to model all possibly entities in the world and understand their interaction or otherwise prevent non-OpenTelemetry instrumentation from interacting with OpenTelemetry entities. - **Embed fully Entity in Resource.** This was rejected because it makes it easy/trivial for Resource attributes and Entities to diverge. This would prevent the backwards/forwards compatibility goals and also require all participating OTLP users to leverage entities. Entity should be an opt-in / additional feature that may or may not be engaged with, depending on user need. - **Re-using resource detectoin as-is** This was reject as not having a viable compatibility path forward. Creating a new set of components that can preserve existing behavior while allowing users to adopt the new functionality means that users have better control of when they see / change system behavior, and adoption is more obvious across the ecosystem. From 0ddcf23155ce7590b9237f124f346052663894bd Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 14 Aug 2024 12:52:57 -0400 Subject: [PATCH 12/51] Update capitalization of resource. --- text/entities/0264-resource-and-entities.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 30a1e6a11..f42283ea0 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -36,9 +36,9 @@ We define the following SDK components: #### Resource Coordinator -The SDK Resource coordinator is responsible for running all configured Resource and Entity Detectors. There will be some (user-controlled, otel default) priority order to these. +The SDK Resource Coordinator is responsible for running all configured Resource and Entity Detectors. There will be some (user-controlled, otel default) priority order to these. -- The resource coordinator will detect conflicts in Entity of the same type being discovered and choose one to use. +- The Resource Coordinator will detect conflicts in Entity of the same type being discovered and choose one to use. - When using Entity Detectors and Resource detectors together, the following merge rules will be used: - Entity merging will occur first resulting in an "Entity Merged" Resource. - Entities of different types will be merged into the resulting Resource. From 090b81730311c2522568b05179d1153efcbd4a07 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 14 Aug 2024 13:04:26 -0400 Subject: [PATCH 13/51] Add link to prometheus compatibility doc. --- text/entities/0264-resource-and-entities.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index f42283ea0..622d2d482 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -92,7 +92,7 @@ The minimum requirements of this entity detector are: - ENV variable can specify multiple entities (resource attribute bundles) - ENV variable can be easily appended or leverages by multiple participating systems, if needed. -- Entities discovered via ENV variable can participate in Resource Manager generically, i.e. resolving conflicting definitions. +- Entities discovered via ENV variable can participate in Resource Coordinator generically, i.e. resolving conflicting definitions. The actual design for this ENV variable interaction would follow the approval of this OTEP. @@ -231,7 +231,7 @@ This proposal motivates a Resource Coordinator in the SDK whose job could includ ### How to deal with Prometheus Compatibility for non-SDK telemetry? -Today, Prometheus compatibility relies on two key attributes in Resource: `service.name` and `service.instance.id`. These are not guaranteed to exist outside of OpenTelemetry SDK generation. While this question is not fully answered, we believe outlining identity in all resources within OpenTelemetry allows us to define a solution in the future while preserving compatibility with what works today. +Today, [Prometheus compatibility](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md) relies on two key attributes in Resource: `service.name` and `service.instance.id`. These are not guaranteed to exist outside of OpenTelemetry SDK generation. While this question is not fully answered, we believe outlining identity in all resources within OpenTelemetry allows us to define a solution in the future while preserving compatibility with what works today. A quick proposal of what this might look like: From 3eb9339d21e301e5d0d21f5e5b97e622be46f40e Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 14 Aug 2024 13:06:18 -0400 Subject: [PATCH 14/51] Update text/entities/0264-resource-and-entities.md Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 622d2d482..3c05037dc 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -127,7 +127,7 @@ The list of detectors is given in priority order (first wins, in event of a tie, Given our desired design and algorithms for detecting, merging and manipulating Entities, we need the ability to denote how entity and resource relate. These changes must not break existing usage of Resource, therefore: -- The Entity model must be *layered on top of* the Resource model. A system does not need to ineract with entities for correct behavior. +- The Entity model must be *layered on top of* the Resource model. A system does not need to interact with entities for correct behavior. - Existing key usage of Resource must remain when using Entities, specifically navigationality (see: [OpenTelemetry Resources: Principles and Characteristics](https://docs.google.com/document/d/1Xd1JP7eNhRpdz1RIBLeA1_4UYPRJaouloAYqldCeNSc/edit)) - Downstream components should be able to engage with the Entity model in Resource. From 1fec84c383ef512323c1a1cc847b7d774dd0ce60 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 20 Aug 2024 08:38:33 -0400 Subject: [PATCH 15/51] Updates to address some comments. --- text/entities/0264-resource-and-entities.md | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 3c05037dc..2b8d9f891 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -45,8 +45,8 @@ The SDK Resource Coordinator is responsible for running all configured Resource - Entities of the same type will have one rejected and one accepted, based on priority. - Resource detectors otherwise follow existing merge semantics. - The Specification merge rules will be updated to account for violations prevalent in ALL implementation of resource detection. - - Specifically: This means the rules around merging Resource across schema-url will be dropped. Instead only conflicting attributes will be dropped. - - SchemaURL on Resource will need to be deprecated with entity-specific schema-url replacing it. Additionally, as no Resource semantic conventions have ever stabilized, SchemaURL usage on Resource cannot be in stable components of OpenTelemetry. Given prevalent violation of implementations around Resource merge specification, we suspect impact of this deprecation to be minimal. + - Specifically: This means the [rules around merging Resource across schema-url will be dropped](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md#merge). Instead only conflicting attributes will be dropped. + - SchemaURL on Resource will need to be deprecated with entity-specific schema-url replacing it. SDKs will no longer fill out SchemaURL on Resource. Additionally, as no (non-service) Resource semantic conventions have ever stabilized, SchemaURL usage on Resource cannot be in stable components of OpenTelemetry. Given prevalent concerns of implementations around Resource merge specification, we suspect impact of this deprecation to be minimal. - An OOTB "Env Variable Entity Detector" will be specified and provided vs. requiring SDK wide ENV variables for resource detection. - *Additionally, Resource Coordinator would be responsible for understanding Entity lifecycle events, for Entities whose lifetimes do not match or exceed the SDK's own lifetime (e.g. browser session).* @@ -90,9 +90,10 @@ set OTEL_DETECTED_ENTITIES=k8s.deployment[k8s.deployment.name=my-program],k8s.po The minimum requirements of this entity detector are: -- ENV variable can specify multiple entities (resource attribute bundles) -- ENV variable can be easily appended or leverages by multiple participating systems, if needed. -- Entities discovered via ENV variable can participate in Resource Coordinator generically, i.e. resolving conflicting definitions. +- ENV variable(s) can specify multiple entities (resource attribute bundles) +- ENV variable(s) can be easily appended or leverages by multiple participating systems, if needed. +- Entities discovered via ENV variable(s) can participate in Resource Coordinator generically, i.e. resolving conflicting definitions. +- ENV variable(s) have a priority that can be influenced by platform entity providers (e.g. prepending vs. appending) The actual design for this ENV variable interaction would follow the approval of this OTEP. @@ -110,7 +111,7 @@ processors: override: false ``` -The future entity-based detector would look almost exactly the same, but interact with the entity model of resource: +The future entity-based detector would look exactly the same, but interact with the entity model of resource: ```yaml processor: @@ -123,6 +124,10 @@ processor: The list of detectors is given in priority order (first wins, in event of a tie, outside of override configuration). The processor may need to be updated to allow the override flag to apply to each individual detector. +The rules for attributes would follow entity merging rules, as defined for the SDK resource manager. + +Note: While this proposals shows a new processor replcing the `resourcedetection` processor, the details of whether to modify-in-place the existing `resourcedetection` processor or create a new one would be determined as a follow up to this design. + ## Datamodel Changes Given our desired design and algorithms for detecting, merging and manipulating Entities, we need the ability to denote how entity and resource relate. These changes must not break existing usage of Resource, therefore: From 5cbc0d5d97fa1487f5449276ba6a4682309cc5d4 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Thu, 22 Aug 2024 10:29:52 -0400 Subject: [PATCH 16/51] Add use case to check rendering on github. --- text/entities/0264-resource-and-entities.md | 39 +++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 2b8d9f891..4a40f8a2a 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -287,3 +287,42 @@ Below is a brief discussion of some design decisions: ## Future Posibilities This proposal opens the door for addressing issues where an Entity's lifetime does not match an SDK's lifetime, in addition to providing a data model where mutable (descriptive) attributes can be changed over the lifetime of a resource without affecting its idnetity. We expect a follow-on OTEP which directly handles this issue. + +## Use Cases + +Below are a set of use cases to help motivate this design. + +### SDK and Collector - Entity coordination + +Let's consider the interaction of resource, entity in the presence of an SDK +and a Collector: + +```mermaid +flowchart LR + A["`**SDK**`"] -->|OTLP| B["`**Collector**`"] + A -.- D((Resource Coordinator)) + B -.- C((Resource Processor)) + C -. Detects .-> E{{"`k8s.pod + *schema: 1.26.0* + `"}} + C -. Detects .-> F{{k8s.deployment}} + D -. Detects .-> G{{"`k8s.pod + *schema: 1.25.0* + `"}} + D -. Detects .-> H{{service}} +``` + +Here, an SDK is communicating with a Collector. The SDK and the collector +are both participating in resource detection through the use of entities, +however the installed versions of software are leverage different standard +versions between the collector and the SDK. + +Ideally, we'd like a solution where: + +- The user can ensure only attributes related to previously undiscovered, + but relevant, entities can be added in Resource (specifically, `k8s.deployment`). +- The user can address issues where schema version `1.26.0` and `1.25.0` may + have different attributes for the same entity. +- We have default rules and merging that requires the least amount of + configuration or customization for users to acheive their desired + attributes in resource. From d4f2965c240a1a7ec18d78547ef0568bba02f97b Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Thu, 22 Aug 2024 10:39:37 -0400 Subject: [PATCH 17/51] Update text/entities/0264-resource-and-entities.md Co-authored-by: jack-berg <34418638+jack-berg@users.noreply.github.com> --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 4a40f8a2a..0a61715d0 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -68,7 +68,7 @@ Where `Result` is the equivalent of error channel in the language of choice (e.g The most important aspect of this design is how Entities will be merged to construct a Resource. We provide a simple algorithm for this behavior: - Construct a set of detected entities, E -- All entity detectors are sorted by priority +- All entity detectors are sorted by priority (highest first) - For each entity detector - For each entity detected - If the entity exists in E, ignore it From 1faf62f52bbd90a34008afdc675a867ffe546dad Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Thu, 22 Aug 2024 10:44:00 -0400 Subject: [PATCH 18/51] Fix from review. --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 0a61715d0..47aee57cf 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -47,7 +47,7 @@ The SDK Resource Coordinator is responsible for running all configured Resource - The Specification merge rules will be updated to account for violations prevalent in ALL implementation of resource detection. - Specifically: This means the [rules around merging Resource across schema-url will be dropped](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md#merge). Instead only conflicting attributes will be dropped. - SchemaURL on Resource will need to be deprecated with entity-specific schema-url replacing it. SDKs will no longer fill out SchemaURL on Resource. Additionally, as no (non-service) Resource semantic conventions have ever stabilized, SchemaURL usage on Resource cannot be in stable components of OpenTelemetry. Given prevalent concerns of implementations around Resource merge specification, we suspect impact of this deprecation to be minimal. - - An OOTB "Env Variable Entity Detector" will be specified and provided vs. requiring SDK wide ENV variables for resource detection. + - An OOTB ["Env Variable Entity Detector"](#environment-variable-detector) will be specified and provided vs. requiring SDK wide ENV variables for resource detection. - *Additionally, Resource Coordinator would be responsible for understanding Entity lifecycle events, for Entities whose lifetimes do not match or exceed the SDK's own lifetime (e.g. browser session).* #### Entity Detector From 7fb267f6b36ad436f477265cd151eec0f0e464c5 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Thu, 22 Aug 2024 10:46:10 -0400 Subject: [PATCH 19/51] Add link to previous proposal --- text/entities/0264-resource-and-entities.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 47aee57cf..5850c74db 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -4,6 +4,8 @@ This is a proposal to address Resource and Entity data model interactions, including a path forward to address immediate friction and issues in the current resource specification. +It is an expansion on the [previous entity proposal](0256-entities-data-model.md). + From b6c7ef9e27f2f9a8dd473a3bebc26f46755b7486 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Mon, 26 Aug 2024 15:48:56 -0400 Subject: [PATCH 20/51] Add more examples. --- text/entities/0264-resource-and-entities.md | 120 ++++++++++++++++++-- 1 file changed, 112 insertions(+), 8 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 5850c74db..a8148a7cc 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -294,24 +294,128 @@ This proposal opens the door for addressing issues where an Entity's lifetime do Below are a set of use cases to help motivate this design. -### SDK and Collector - Entity coordination +### SDK - Multiple Detectors of the same Entity type + +TODO - Describe + +### SDK and Collector - Simple coordination Let's consider the interaction of resource, entity in the presence of an SDK and a Collector: ```mermaid flowchart LR - A["`**SDK**`"] -->|OTLP| B["`**Collector**`"] - A -.- D((Resource Coordinator)) - B -.- C((Resource Processor)) - C -. Detects .-> E{{"`k8s.pod + SDK["`**SDK**`"] -->|OTLP| COLLECTOR["`**Collector**`"] + COLLECTOR -->|OTLP| BACKEND["`**Backend**`"] + SDK -.- RC((Resource Coordinator)) + COLLECTOR -.- RP((Resource Processor)) + RP -. Detects .-> EC2{{aws.ec2}} + RP -. Detects .-> HOST{{host}} + RC -. Detects .-> PROCESS{{process}} + RC -. Detects .-> SERVICE{{service}} +``` + +Here, an SDK is running on Amazon EC2. it is configured with resource detection +that find a `process` and `service` entity. The SDK is sending data to an +OpenTelemetry Collector that has a resource processor configured to detect +the `ec2` and `host` entities. + +The resulting OTLP from the collector would contain a resource with all +of the entities (`process`, `service`, `ec2`, and `host`). This is because +the entities are all disjoint. + +*Note: this matches today's behavior of existing resource detection and OpenTelmetry collector where all attributes wind up on resource.* + +### SDK and Collector - Entity coordination with descriptive attributes + +Let's consider the interaction of resource, entity where both the SDK and the Collector detect an entity: + +```mermaid +flowchart LR + SDK["`**SDK**`"] -->|OTLP| COLLECTOR["`**Collector**`"] + COLLECTOR -->|OTLP| BACKEND["`**Backend**`"] + SDK -.- RC((Resource Coordinator)) + COLLECTOR -.- RP((Resource Processor)) + RP -. Detects .-> HOST2{{host}} + RC -. Detects .-> HOST{{host}} + RC -. Detects .-> SERVICE{{service}} +``` + +Here, and SDK is running on a machine (physical or virtual). The SDK is +configured to detect the host it is running on. The collector is also running +on a machine (physical or virtual). Both the SDK and the Collector detect +a `host` entity (with the same identity). + +The behavior would be as follows: + +- By default, the collector would append any missing descriptive attributes + from its `host` entity to the `host` entity and resource. +- If the collector's processor is configured to `override: true`, then the + host entity from the SDK would be dropped in favor of the collector's `host` + entity. All identifying+descriptive attributes from the original entity + + resource would be removed and those detected in the collector would replace it. + +This allows the collector to enrich or enhance resource attributes without altering the *identity* of the source. + +### SDK and Collector - Entity coordination with conflicts + +Let's consider the interaction of resource, entity where there is an identity conflict between the SDK and the Collector: + +```mermaid +flowchart LR + SDK["`**SDK**`"] -->|OTLP| COLLECTOR["`**Collector**`"] + COLLECTOR -->|OTLP| BACKEND["`**Backend**`"] + SDK -.- RC((Resource Coordinator)) + COLLECTOR -.- RP((Resource Processor)) + RP -. Detects .-> HOST2{{host 2}} + RC -. Detects .-> HOST{{host 1}} + RC -. Detects .-> SERVICE{{service}} +``` + +Here, and SDK is running on a machine (physical or virtual). The SDK is +configured to detect the host it is running on. The collector is also running +on a machine (physical or virtual). Both the SDK and the Collector detect +a `host` entity. However, the `host` entity has a *different identity* between +the SDK and Collector. + +The behavior would be as follows: + +- The default would *drop* the entity detected by the collector, as the + entity identity does not match. This would mean, e.g. descriptive host + attributes from the collector are **not** added to the Resource in OTLP. +- If the collector's processor is configured to `override: true`, then the + host entity from the SDK would be dropped in favor of the collector's `host` + entity. All identifying+descriptive attributes from the original entity + + resource would be removed and those detected in the collector would replace it. + +The default behavior is useful when the SDK and Collector are run on different +machines. Unlike today's resource detection, this could prevent `host` +descriptive attributes that were not detected by the SDK from being added to the +resource. + +The `override` behavior could also ensure that attributes which should be +detected and reported together are replaced together. Today, it's possible the +collector may detect and override some, but not all attributes from the SDK. + +### SDK and Collector - Entity coordination across versions + +Let's look at SDK + collector coordination where semantic version differences +can occur between components within the system. + +```mermaid +flowchart LR + SDK["`**SDK**`"] -->|OTLP| COLLECTOR["`**Collector**`"] + COLLECTOR -->|OTLP| BACKEND["`**Backend**`"] + SDK -.- RC((Resource Coordinator)) + COLLECTOR -.- RP((Resource Processor)) + RP -. Detects .-> POD{{"`k8s.pod *schema: 1.26.0* `"}} - C -. Detects .-> F{{k8s.deployment}} - D -. Detects .-> G{{"`k8s.pod + RP -. Detects .-> DEPLOYMENT{{k8s.deployment}} + RC -. Detects .-> POD2{{"`k8s.pod *schema: 1.25.0* `"}} - D -. Detects .-> H{{service}} + RC -. Detects .-> SERVICE{{service}} ``` Here, an SDK is communicating with a Collector. The SDK and the collector From d40f1e017bc22b3b93540bf73e16f3e73b9aa9a0 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Mon, 26 Aug 2024 15:58:47 -0400 Subject: [PATCH 21/51] Add last use case. --- text/entities/0264-resource-and-entities.md | 30 ++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index a8148a7cc..28258b59b 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -296,7 +296,35 @@ Below are a set of use cases to help motivate this design. ### SDK - Multiple Detectors of the same Entity type -TODO - Describe +Let's consider the interaction of the SDK in the presence of multiple registered +entity detectors: + +```mermaid +flowchart LR + SDK["`**SDK**`"] -->|OTLP| BACKEND["`**Backend**`"] + SDK -.- RC((Resource Coordinator)) + RC -.- OTEL_DETECTOR((OpenTelemetry Default Resource Detection)) + RC -.- GCP_DETECTOR((Google Cloud Specific Resource Detection)) + GCP_DETECTOR -. Detects .-> GCE{{gcp.gce}} + GCP_DETECTOR -. Detects .-> GCPHOST{{host (gcp)}} + OTEL_DETECTOR -. Detects .-> HOST{{host (generic)}} + OTEL_DETECTOR -. Detects .-> PROCESS{{process}} + OTEL_DETECTOR -. Detects .-> SERVICE{{service}} +``` + +Here, there is a services running on the Google Compute Engine. The user +has configured a Google Cloud specific set of entity detectors. Both the +built in OpenTelemetry detection and the configured Google Cloud detection +discover a `host` entity. + +The following outcome would occur: + +- The resulting resource would have all of the following entities: `host`, `process`, `service` and `gcp.gce` +- The user-configured resource detector would take priority over built in: the `host` defined from the Google Cloud detection would "win" and be included in resource. + - This means `host.id` e.g. could be the id discovered for GCE VMs. Similarly for other cloud provider detection, like Amazon EC2 where VMs are given a unique ID by the Cloud Provider, rather than a generic machinne ID, e.g. + - This matches existing behavior/expectations today for AWS, GCP, etc. on what `host.id` would mean. +- Users would be able to configure which host wins, by swapping the priority order of "default" vs. cloud-specific detection. + ### SDK and Collector - Simple coordination From f679af21d880947c715f1e701e90bfe930805d07 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Mon, 26 Aug 2024 16:01:43 -0400 Subject: [PATCH 22/51] Fix mermaid diagram syntax. --- text/entities/0264-resource-and-entities.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 28258b59b..afb45c157 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -306,8 +306,8 @@ flowchart LR RC -.- OTEL_DETECTOR((OpenTelemetry Default Resource Detection)) RC -.- GCP_DETECTOR((Google Cloud Specific Resource Detection)) GCP_DETECTOR -. Detects .-> GCE{{gcp.gce}} - GCP_DETECTOR -. Detects .-> GCPHOST{{host (gcp)}} - OTEL_DETECTOR -. Detects .-> HOST{{host (generic)}} + GCP_DETECTOR -. Detects .-> GCPHOST{{"host (gcp)"}} + OTEL_DETECTOR -. Detects .-> HOST{{"host (generic)"}} OTEL_DETECTOR -. Detects .-> PROCESS{{process}} OTEL_DETECTOR -. Detects .-> SERVICE{{service}} ``` From 22ac1fab9011084f80583723b0d33818edc6a72b Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 27 Aug 2024 16:35:28 -0400 Subject: [PATCH 23/51] Update merge algorithm based on feedback. --- text/entities/0264-resource-and-entities.md | 28 ++++++++++++--------- 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index afb45c157..81598edfa 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -42,9 +42,7 @@ The SDK Resource Coordinator is responsible for running all configured Resource - The Resource Coordinator will detect conflicts in Entity of the same type being discovered and choose one to use. - When using Entity Detectors and Resource detectors together, the following merge rules will be used: - - Entity merging will occur first resulting in an "Entity Merged" Resource. - - Entities of different types will be merged into the resulting Resource. - - Entities of the same type will have one rejected and one accepted, based on priority. + - Entity merging will occur first resulting in an "Entity Merged" Resource (See [algorithm here](#entity-merging-and-resource)). - Resource detectors otherwise follow existing merge semantics. - The Specification merge rules will be updated to account for violations prevalent in ALL implementation of resource detection. - Specifically: This means the [rules around merging Resource across schema-url will be dropped](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md#merge). Instead only conflicting attributes will be dropped. @@ -67,15 +65,21 @@ Where `Result` is the equivalent of error channel in the language of choice (e.g #### Entity Merging and Resource -The most important aspect of this design is how Entities will be merged to construct a Resource. We provide a simple algorithm for this behavior: - -- Construct a set of detected entities, E -- All entity detectors are sorted by priority (highest first) -- For each entity detector - - For each entity detected - - If the entity exists in E, ignore it - - Otherwise, add the entity to E -- Construct a Resource from the set E. +The most important aspect of this design is how Entities will be merged to construct a Resource. + +We provide a simple algorithm for this behavior: + +- Construct a set of detected entities, `E` + - All entity detectors are sorted by priority (highest first) + - For each entity detector `D`, detect entities + - For each entity detected, `d'` + - If an entity `e'` exists in `E` with same entity type as `d'`, do one of the following: + - If the entity identiy and schema_url are the same, merge the descriptive attributes of `d'` into `e'`. + - If the entity identity is the same, but schema_url is different: drop the new entity `d'` + *Note: We could offer configuration in this case* + - If the entity identity is different: drop the new entity `d'`. + - Otherwise, add the entity `d'` to set `E` +- Construct a Resource from the set `E`. Any implementation that achieves the same result as this algorithm is acceptable. From 1555b425fc008a16393e36cf437c22ce6a8cfdd8 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 28 Aug 2024 13:54:18 -0400 Subject: [PATCH 24/51] Fix descriptive attribute merge description. --- text/entities/0264-resource-and-entities.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 81598edfa..9a93e6feb 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -74,7 +74,10 @@ We provide a simple algorithm for this behavior: - For each entity detector `D`, detect entities - For each entity detected, `d'` - If an entity `e'` exists in `E` with same entity type as `d'`, do one of the following: - - If the entity identiy and schema_url are the same, merge the descriptive attributes of `d'` into `e'`. + - If the entity identiy and schema_url are the same, merge the descriptive attributes of `d'` into `e'`: + - For each descriptive attribute `da'` in `d'` + - If `da'.key` does not exist in `e'`, then add `da'` to `ei` + - otherwise, ignore. - If the entity identity is the same, but schema_url is different: drop the new entity `d'` *Note: We could offer configuration in this case* - If the entity identity is different: drop the new entity `d'`. From b567974e4af253b480000ff6fa0e4936e43fec3f Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 28 Aug 2024 14:40:17 -0400 Subject: [PATCH 25/51] Add a survey of existing resource attributes detected and how they are bundled. --- text/entities/0264-resource-and-entities.md | 182 ++++++++++++++++++++ 1 file changed, 182 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 9a93e6feb..e1affc22b 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -467,3 +467,185 @@ Ideally, we'd like a solution where: - We have default rules and merging that requires the least amount of configuration or customization for users to acheive their desired attributes in resource. + +## Collection of Resource detectors and attributes used + +- Collector + - [system](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/internal/system/metadata.yaml) + - host.arch + - host.name + - host.id + - host.ip + - host.mac + - host.cpu.vendor.id + - host.cpu.family + - host.cpu.model.id + - host.cpu.model.name + - host.cpu.stepping + - host.cpu.cache.l2.size + - os.description + - os.type + - [docker](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/internal/docker/metadata.yaml) + - host.name + - os.type + - [heroku](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/internal/heroku/metadata.yaml) + - cloud.provider + - heroku.app.id + - heroku.dyno.id + - heroku.release.commit + - heroku.release.creation_timestamp + - service.instance.id + - service.name + - service.version + - [gcp](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/internal/gcp/metadata.yaml) + - gke + - cloud.provider + - cloud.platform + - cloud.account.id + - cloud.region + - cloud.availability_zone + - k8s.cluster.name + - host.id + - host.name + - gce + - cloud.provider + - cloud.platform + - cloud.account.id + - cloud.region + - cloud.availability_zone + - host.id + - host.name + - host.type + - (optional) gcp.gce.instance.hostname + - (optional) gcp.gce.instance.name + - AWS + - [ec2](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/internal/aws/ec2/metadata.yaml) + - cloud.provider + - cloud.platform + - cloud.account.id + - cloud.region + - cloud.availability_zone + - host.id + - host.image.id + - host.name + - host.type + - [ecs](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/internal/aws/ecs/metadata.yaml) + - cloud.provider + - cloud.platform + - cloud.account.id + - cloud.region + - cloud.availability_zone + - aws.ecs.cluster.arn + - aws.ecs.task.arn + - aws.ecs.task.family + - aws.ecs.task.id + - aws.ecs.task.revision + - aws.ecs.launchtype (V4 only) + - aws.log.group.names (V4 only) + - aws.log.group.arns (V4 only) + - aws.log.stream.names (V4 only) + - aws.log.stream.arns (V4 only) + - [elastic_beanstalk](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/internal/aws/elasticbeanstalk/metadata.yaml) + - cloud.provider + - cloud.platform + - deployment.environment + - service.instance.id + - service.version + - [eks] + - cloud.provider + - cloud.platform + - k8s.cluster.name + - [lambda] + - cloud.provider + - cloud.platform + - cloud.region + - faas.name + - faas.version + - faas.instance + - faas.max_memory + - aws.log.group.names + - aws.log.stream.names + - [Azure] + - cloud.provider + - cloud.platform + - cloud.region + - cloud.account.id + - host.id + - host.name + - azure.vm.name + - azure.vm.size + - azure.vm.scaleset.name + - azure.resourcegroup.name + - Azure [aks] + - cloud.provider + - cloud.platform + - k8s.cluster.name + - Consul + - cloud.region + - host.id + - host.name + - *exploded consul metadata* + - k8s Node + - k8s.node.uid + - Openshift + - cloud.provider + - cloud.platform + - cloud.region + - k8s.cluster.name +- Java Resource Detection + - SDK-Default + - service.name + - telemetry.sdk.version + - telemetry.sdk.language + - telemetry.sdk.name + - [process](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/691de74a4b0539c1329222aefb962c232028032b/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/ProcessResource.java#L60) + - process.pid + - process.command_line + - process.command_args + - process.executable.path + - [host](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/HostResource.java#L31) + - host.name + - host.arch + - [container](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/ContainerResource.java) + - container.id + - [os](https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/OsResource.java) + - os.type + - [AWS](https://github.com/open-telemetry/opentelemetry-java-contrib/tree/main/aws-resources) + - EC2 + - host.id + - cloud.availability_zone + - host.type + - host.image.id + - cloud.account.id + - cloud.region + - host.name + - ECS + - cloud.provider + - cloud.platform + - aws.log.group.names + - aws.log.stream.names + - EKS + - cloud.provider + - cloud.platform + - k8s.cluster.name + - container.id + - Lambda + - cloud.platform + - cloud.region + - faas.name + - faas.version + - [GCP](https://github.com/open-telemetry/opentelemetry-java-contrib/tree/main/gcp-resources) + - cloud.provider + - cloud.platform + - cloud.account.id + - cloud.availability_zone + - cloud.region + - host.id + - host.name + - host.type + - k8s.pod.name + - k8s.namespace.name + - k8s.container.name + - k8s.cluster.name + - faas.name + - faas.instance From af0165d9cc1abb6af3640ff4b73af83ed613fcb1 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 28 Aug 2024 15:02:27 -0400 Subject: [PATCH 26/51] Add some analysis of bundled entities from detection. --- text/entities/0264-resource-and-entities.md | 42 +++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index e1affc22b..77d083e33 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -649,3 +649,45 @@ Ideally, we'd like a solution where: - k8s.cluster.name - faas.name - faas.instance +- [OTEL operator](https://github.com/open-telemetry/opentelemetry-operator/blob/a1e8f927909b81eb368c0483940e0b90d7fdb057/pkg/instrumentation/sdk_test.go#L752) injected ENV variables + - service.instance.id + - service.name + - service.version + - k8s.namespace.name + - k8s.pod.name + - k8s.node.name + - k8s.container.name + +Some initial thoughts on implications: + +AWS, Azure, GCP, Heroku, etc. all provide the following "bundles" of resource: + +- `cloud.*` +- `faas.*`, when relevant +- `host.*`, when relevant +- `k8s.cluster.*`, when relevant +- `service.*` when relevant +- `container.*` for a subset of k8s providers + +"system" detection provides the following: + +- `host.*` +- `os.*` +- `process.*` for SDKs +- `container.*` for Docker images + +SDK specific detection provides the following: + +- `sdk.*` +- `service.*` + +The OTEL operator for k8s provides the following via ENV variables: + +- `k8s.namespace.*` +- `k8s.node.*` +- `k8s.pod.*` +- `k8s.container.*` +- `service.*` + + + From 26de765f898eb90c9f227d222f83220532011951 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 28 Aug 2024 15:24:46 -0400 Subject: [PATCH 27/51] Add go survey of resource attributes. --- text/entities/0264-resource-and-entities.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 77d083e33..5ddde1de3 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -649,6 +649,23 @@ Ideally, we'd like a solution where: - k8s.cluster.name - faas.name - faas.instance + - Go + - [container](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/resource/container.go) + - container.id + - [host](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/resource/host_id.go) + - host.id + - [os](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/resource/os.go) + - os.name + - [process](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/resource/process.go) + - process.pid + - process.executable.name + - process.executable.path + - process.command_line + - process.command_args + - process.owner + - [builtin](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/resource/builtin.go) + - service.instance.id + - service.name - [OTEL operator](https://github.com/open-telemetry/opentelemetry-operator/blob/a1e8f927909b81eb368c0483940e0b90d7fdb057/pkg/instrumentation/sdk_test.go#L752) injected ENV variables - service.instance.id - service.name From dc95ee9491300293c6115a0c396fb5c77fbeac56 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 28 Aug 2024 15:45:39 -0400 Subject: [PATCH 28/51] Add litmus test for when to include an entity on a resource. --- text/entities/0264-resource-and-entities.md | 38 +++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 5ddde1de3..90c7e1efe 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -675,6 +675,9 @@ Ideally, we'd like a solution where: - k8s.node.name - k8s.container.name + +### Implications + Some initial thoughts on implications: AWS, Azure, GCP, Heroku, etc. all provide the following "bundles" of resource: @@ -707,4 +710,39 @@ The OTEL operator for k8s provides the following via ENV variables: - `service.*` +### What could this mean for chosing entities that belong on resource? + +Let's look at an example of a container running in kubernetes, specifically EKS. + +If the OTEL operator, the SDK and the collector are all used, the following +attributes will wind up on resource: + +- `service.*` - from SDK and otel operator +- `sdk.*` - from SDK +- `process.*` - from SDK +- `host.*` - Note: from system detector on collector +- `container.*` - from EKS detector on SDK +- `k8s.namespace.*` - from otel operator +- `k8s.node.*` - from otel operator +- `k8s.pod.*` - from otel operator +- `k8s.container.*` - from otel operator +- `k8s.cluster.*` - from EKS detector on SDK or collector +- `cloud.*` - from EKS detector on SDK or collector + +A simple litmus test derived from this for when to include an entity on +Resource would be: "Any entity relevant to the produced telemetry should be +included". + +However, this can be refined. Resources today provide a [few key features](https://docs.google.com/document/d/1Xd1JP7eNhRpdz1RIBLeA1_4UYPRJaouloAYqldCeNSc/edit): + +- They provide identity - Uniquely identifying the origin of the data. +- They provide "navigationality" - allowing users to find the source of the data within their o11y and infrastructure tools. +- They allow aggregation / slicing of data on interesting domains. + +A litmus test for what entities to include on resource should be as follows: + +- Is the entity the source/origin of the data? +- Does the entity help navigate to the source of the data? (e.g. `k8s.cluster.*` helping find a `k8s.container.*`) +- Do want to easily slice/aggregate on an axis provided by the entity? (e.g. quickly filtering all CPU container usage metrics across a cluster to find overloaded nodes). +If the answer to any question is yes, then include the entity on resource. From c1c1eebfc02197e9c6156e5dca826069622b7a09 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Thu, 29 Aug 2024 16:19:32 -0400 Subject: [PATCH 29/51] Add one open question from WG. --- text/entities/0264-resource-and-entities.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 90c7e1efe..b386c5fdf 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -279,6 +279,15 @@ While we expect the collector to be the first component to start engaging with E - Specify that missing attribute keys are acceptable for descriptive attribtues. - Specify that missing attribute keys denote that entities are unusable for that batch of telemetry, and treat the content as malformed. +### What about advanced entity interaction in the Collector? + +On problem that motivated this design is the issue of "local resource detection" vs. "remote signal collection" in the OpenTelemetry collector. That is, I have a process running on a machine writing to an OpenTelemetry +collector running on a different machine. The current `resourcedetectoinprocessor` in the collector appends attributes to resource based on discovering *where the collector is running*. However, +as the collector could determine that telemetry has come from a different machine, it could also avoid adding resource attributes that are not relevant to incoming data. + +Today, `resourcedetectionprocessor` is naive, as is the algorithm proposed in this OTEP. We believe that a more sophisticated solution could be created where the collector would know not to join entities onto a +resource based on more advanced knowledge of the communication protocol used to obtain the data (e.g. using the ip address of the sender on an OTLP server). + ## Trade-offs and mitigations The design proposed here attempts to balance non-breaking (backwards and forwards compatible) changes with the need to improve problematic issues in the Specification. Given the inability of most SDKs to implement the current Resource merge specification, breaking this should have little effect on actual users. Instead, the proposed merge specification should allow implementations to match current behavior and expectation, while evolving for users who engage with the new model. From 41c4d6e208b90c3b3489986434db94932e3f6967 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Thu, 29 Aug 2024 16:20:49 -0400 Subject: [PATCH 30/51] Fix typo. --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index b386c5fdf..707713cf9 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -474,7 +474,7 @@ Ideally, we'd like a solution where: - The user can address issues where schema version `1.26.0` and `1.25.0` may have different attributes for the same entity. - We have default rules and merging that requires the least amount of - configuration or customization for users to acheive their desired + configuration or customization for users to achieve their desired attributes in resource. ## Collection of Resource detectors and attributes used From 0bc047bbcbad6b796486bf5a37001c97450370d9 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Thu, 29 Aug 2024 16:21:36 -0400 Subject: [PATCH 31/51] Fix link issue. --- text/entities/0264-resource-and-entities.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 707713cf9..af6a0f93e 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -560,11 +560,11 @@ Ideally, we'd like a solution where: - deployment.environment - service.instance.id - service.version - - [eks] + - eks - cloud.provider - cloud.platform - k8s.cluster.name - - [lambda] + - lambda - cloud.provider - cloud.platform - cloud.region @@ -574,7 +574,7 @@ Ideally, we'd like a solution where: - faas.max_memory - aws.log.group.names - aws.log.stream.names - - [Azure] + - Azure - cloud.provider - cloud.platform - cloud.region @@ -585,7 +585,7 @@ Ideally, we'd like a solution where: - azure.vm.size - azure.vm.scaleset.name - azure.resourcegroup.name - - Azure [aks] + - Azure aks - cloud.provider - cloud.platform - k8s.cluster.name From ec469b4555d9570fd40e5f3ff775e17e41f7437f Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Thu, 29 Aug 2024 16:24:22 -0400 Subject: [PATCH 32/51] Fix lint issues. --- text/entities/0264-resource-and-entities.md | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index af6a0f93e..fe29ab1e5 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -65,7 +65,7 @@ Where `Result` is the equivalent of error channel in the language of choice (e.g #### Entity Merging and Resource -The most important aspect of this design is how Entities will be merged to construct a Resource. +The most important aspect of this design is how Entities will be merged to construct a Resource. We provide a simple algorithm for this behavior: @@ -341,7 +341,6 @@ The following outcome would occur: - This matches existing behavior/expectations today for AWS, GCP, etc. on what `host.id` would mean. - Users would be able to configure which host wins, by swapping the priority order of "default" vs. cloud-specific detection. - ### SDK and Collector - Simple coordination Let's consider the interaction of resource, entity in the presence of an SDK @@ -527,7 +526,7 @@ Ideally, we'd like a solution where: - host.type - (optional) gcp.gce.instance.hostname - (optional) gcp.gce.instance.name - - AWS + - AWS - [ec2](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/internal/aws/ec2/metadata.yaml) - cloud.provider - cloud.platform @@ -674,7 +673,7 @@ Ideally, we'd like a solution where: - process.owner - [builtin](https://github.com/open-telemetry/opentelemetry-go/blob/main/sdk/resource/builtin.go) - service.instance.id - - service.name + - service.name - [OTEL operator](https://github.com/open-telemetry/opentelemetry-operator/blob/a1e8f927909b81eb368c0483940e0b90d7fdb057/pkg/instrumentation/sdk_test.go#L752) injected ENV variables - service.instance.id - service.name @@ -684,7 +683,6 @@ Ideally, we'd like a solution where: - k8s.node.name - k8s.container.name - ### Implications Some initial thoughts on implications: @@ -718,7 +716,6 @@ The OTEL operator for k8s provides the following via ENV variables: - `k8s.container.*` - `service.*` - ### What could this mean for chosing entities that belong on resource? Let's look at an example of a container running in kubernetes, specifically EKS. From 7a900c93c6bb16a7a51dda7fd6e9d1dfd08f32b2 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Fri, 30 Aug 2024 10:04:05 -0400 Subject: [PATCH 33/51] Generate toc. --- text/entities/0264-resource-and-entities.md | 45 +++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index fe29ab1e5..067b94ac1 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -8,6 +8,51 @@ It is an expansion on the [previous entity proposal](0256-entities-data-model.md +- [Motivation](#motivation) +- [Design](#design) + * [Approach - Resource Improvements](#approach---resource-improvements) + + [Resource Coordinator](#resource-coordinator) + + [Entity Detector](#entity-detector) + + [Entity Merging and Resource](#entity-merging-and-resource) + + [Environment Variable Detector](#environment-variable-detector) + * [Interactions with OpenTelemetry Collector](#interactions-with-opentelemetry-collector) +- [Datamodel Changes](#datamodel-changes) + * [Resource](#resource) + * [ResourceEntityRef](#resourceentityref) +- [How this proposal solves the problems that motivated it](#how-this-proposal-solves-the-problems-that-motivated-it) + * [Problem 1: Commingling of Entities](#problem-1-commingling-of-entities) + * [Problem 2: Lack of Precise Identity](#problem-2-lack-of-precise-identity) + * [Problem 3: Lack of Mutable Attributes](#problem-3-lack-of-mutable-attributes) + * [Problem 4: Metric Cardinality Problem](#problem-4-metric-cardinality-problem) +- [Entity WG Rubric](#entity-wg-rubric) + * [Resource detectors (soon to be entity detectors) need to be composable / disjoint](#resource-detectors-soon-to-be-entity-detectors-need-to-be-composable--disjoint) + * [New entities added by extension should not break existing code](#new-entities-added-by-extension-should-not-break-existing-code) + * [Navigational attributes need to exist and can be used to identify an entity but could be augmented with UUID or other aspects. - Having ONLY a UUID for entity identification is not good enough](#navigational-attributes-need-to-exist-and-can-be-used-to-identify-an-entity-but-could-be-augmented-with-uuid-or-other-aspects---having-only-a-uuid-for-entity-identification-is-not-good-enough) + * [Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets](#collector-augmentation--enrichment-resource-eg---should-be-extensible-and-not-hard-coded-we-need-a-general-algorithm-not-specific-rulesets) + * [Users are expected to provide / prioritize "detectors" and determine which entity is "producing" or most-important for a signal](#users-are-expected-to-provide--prioritize-detectors-and-determine-which-entity-is-producing-or-most-important-for-a-signal) + * [For an SDK - ALL telemetry should be associated with the same set of entities (resource labels)](#for-an-sdk---all-telemetry-should-be-associated-with-the-same-set-of-entities-resource-labels) +- [Open Questions](#open-questions) + * [How to attach Entity "bundle" information in Resource?](#how-to-attach-entity-bundle-information-in-resource) + * [How to deal with Resource/Entities whose lifecycle does not match the SDK?](#how-to-deal-with-resourceentities-whose-lifecycle-does-not-match-the-sdk) + * [How to deal with Prometheus Compatibility for non-SDK telemetry?](#how-to-deal-with-prometheus-compatibility-for-non-sdk-telemetry) + * [Should entities have a domain?](#should-entities-have-a-domain) + * [Should resources have only one associated entity?](#should-resources-have-only-one-associated-entity) + * [What identity should entities use (LID, UUID / GUID, or other)?](#what-identity-should-entities-use-lid-uuid--guid-or-other) + * [What happens if existing Resource translation in the collector remove resource attributes an Entity relies on?](#what-happens-if-existing-resource-translation-in-the-collector-remove-resource-attributes-an-entity-relies-on) + * [What about advanced entity interaction in the Collector?](#what-about-advanced-entity-interaction-in-the-collector) +- [Trade-offs and mitigations](#trade-offs-and-mitigations) +- [Prior art and alternatives](#prior-art-and-alternatives) +- [Future Posibilities](#future-posibilities) +- [Use Cases](#use-cases) + * [SDK - Multiple Detectors of the same Entity type](#sdk---multiple-detectors-of-the-same-entity-type) + * [SDK and Collector - Simple coordination](#sdk-and-collector---simple-coordination) + * [SDK and Collector - Entity coordination with descriptive attributes](#sdk-and-collector---entity-coordination-with-descriptive-attributes) + * [SDK and Collector - Entity coordination with conflicts](#sdk-and-collector---entity-coordination-with-conflicts) + * [SDK and Collector - Entity coordination across versions](#sdk-and-collector---entity-coordination-across-versions) +- [Collection of Resource detectors and attributes used](#collection-of-resource-detectors-and-attributes-used) + * [Implications](#implications) + * [What could this mean for chosing entities that belong on resource?](#what-could-this-mean-for-chosing-entities-that-belong-on-resource) + ## Motivation From 43454e2c32209b6611fb6ded19a491a64436b354 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 10 Sep 2024 07:46:33 -0400 Subject: [PATCH 34/51] Fix algorithm to account for existing schema url. --- text/entities/0264-resource-and-entities.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 067b94ac1..48085f04a 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -91,7 +91,7 @@ The SDK Resource Coordinator is responsible for running all configured Resource - Resource detectors otherwise follow existing merge semantics. - The Specification merge rules will be updated to account for violations prevalent in ALL implementation of resource detection. - Specifically: This means the [rules around merging Resource across schema-url will be dropped](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md#merge). Instead only conflicting attributes will be dropped. - - SchemaURL on Resource will need to be deprecated with entity-specific schema-url replacing it. SDKs will no longer fill out SchemaURL on Resource. Additionally, as no (non-service) Resource semantic conventions have ever stabilized, SchemaURL usage on Resource cannot be in stable components of OpenTelemetry. Given prevalent concerns of implementations around Resource merge specification, we suspect impact of this deprecation to be minimal. + - SchemaURL on Resource will be deprecated with entity-specific schema-url replacing it. SDKs will only fill out SchemaURL on Resource when SchemaURL matches across all entities discovered. Additionally, only existing stable resource attributes can be used in Resource SchemaURL in stable OpenTelemetry components (Specifially `service.*` and `sdk.*` are the only stabilized resource convnetions). Given prevalent concerns of implementations around Resource merge specification, we suspect impact of this deprecation to be minimal, and existing usage was within the "experimental" phase of semantic conventions. - An OOTB ["Env Variable Entity Detector"](#environment-variable-detector) will be specified and provided vs. requiring SDK wide ENV variables for resource detection. - *Additionally, Resource Coordinator would be responsible for understanding Entity lifecycle events, for Entities whose lifetimes do not match or exceed the SDK's own lifetime (e.g. browser session).* @@ -128,6 +128,9 @@ We provide a simple algorithm for this behavior: - If the entity identity is different: drop the new entity `d'`. - Otherwise, add the entity `d'` to set `E` - Construct a Resource from the set `E`. + - If all entities within `E` have the same `schema_url`, se the resource's + `schema_url` to match. + - Otherwise, leave the Resource `schema_url` blank. Any implementation that achieves the same result as this algorithm is acceptable. From 775bb30e4d8e705f648d2d1252a65e3e218223e8 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 11 Sep 2024 09:50:46 -0400 Subject: [PATCH 35/51] Add clarification on entity detector API. --- text/entities/0264-resource-and-entities.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 48085f04a..a60ef6481 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -108,6 +108,8 @@ trait EntityDetector Where `Result` is the equivalent of error channel in the language of choice (e.g. in Go this would be `entities, err := e.detectEntities()`). +An Entity Detector MUST NOT provide two entities of the same entity type (e.g. two `host` or two `service` entities). + #### Entity Merging and Resource The most important aspect of this design is how Entities will be merged to construct a Resource. From 8d9a12ee251e35b3cc45d8f9bf66e8eed2415315 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 11 Sep 2024 09:54:41 -0400 Subject: [PATCH 36/51] Add definition of platform. --- text/entities/0264-resource-and-entities.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index a60ef6481..f6a372321 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -138,7 +138,9 @@ Any implementation that achieves the same result as this algorithm is acceptable #### Environment Variable Detector -An Entity detector will be specified to allow Platform to inject entity identity information into workloads running on that platform. For Example, the OpenTelemetry Operator could inject information about Kubernetes Deployment + Container into the environment, which SDKs can elect to interact with (through configuration of the Environment Variable Entity Detector). +An Entity detector will be specified to allow Platform to inject entity identity information into workloads running on that platform. For Example, the OpenTelemetry Operator could inject information about Kubernetes Deployment + Container into the environment, which SDKs can elect to interact with (through configuration of the Environment Variable Entity Detector). Here, Platform means an environment that can run workloads that would provide identity of those workloads, e.g. Kubernetes, Spark, Cloud environments, etc. + +See [#3966](https://github.com/open-telemetry/opentelemetry-specification/issues/3966) for context on this issue. While details of ENV variables will be subject to change, it would look something like the following: From 061808c3eaa87a4fca5e1cd056e69d24ec74b9d5 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Wed, 11 Sep 2024 09:56:48 -0400 Subject: [PATCH 37/51] Fix typo and clarify. --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index f6a372321..f9ecdbafd 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -187,7 +187,7 @@ The list of detectors is given in priority order (first wins, in event of a tie, The rules for attributes would follow entity merging rules, as defined for the SDK resource manager. -Note: While this proposals shows a new processor replcing the `resourcedetection` processor, the details of whether to modify-in-place the existing `resourcedetection` processor or create a new one would be determined as a follow up to this design. +Note: While this proposals shows a new processor replacing the `resourcedetection` processor, the details of whether to modify-in-place the existing `resourcedetection` processor or create a new one would be determined as a follow up to this design. Ideally, we don't want users to need new configuration for resource in the otel collector. ## Datamodel Changes From a3052affd77e73b5d83a1f1f71db6a6e7158bae4 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Fri, 13 Sep 2024 13:25:56 -0400 Subject: [PATCH 38/51] Fixes from review. --- text/entities/0264-resource-and-entities.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index f9ecdbafd..6948ee7ec 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -97,7 +97,7 @@ The SDK Resource Coordinator is responsible for running all configured Resource #### Entity Detector -The Entity detector in the SDK is responsible for detecting possible entities that could identify the SDK. For Example, if the SDK is running in a kubernetes pod, it may provide an Entity for that pod. SDK Entity Detectors are only required to provide identifying attributes, but may provide descriptive attributes to ensure combined Resource contains similar attributes as today's SDK. +The Entity detector in the SDK is responsible for detecting possible entities that could identify the SDK (called "associated entities"). For Example, if the SDK is running in a kubernetes pod, it may provide an Entity for that pod. SDK Entity Detectors are only required to provide identifying attributes, but may provide descriptive attributes to ensure combined Resource contains similar attributes as today's SDK. An Entity Detector would have an API similar to: @@ -264,7 +264,12 @@ Resource will still be composed of identifying and descriptive attributes of Ent ### Collector augmentation / enrichment (resource, e.g.) - Should be extensible and not hard-coded. We need a general algorithm not specific rulesets -Entity concept provides a new "bundle" mechanism to resource for the Collector to augment enrich a group of attributes and better identify conflicts (or identity changes) caused therein. +The concept of "Entity" is a new definition for Resource. Where previously, resource was a collection of attributes and users would interact with each +individually, now there is a "bundle" of attributes called an Entity. Entities have an identity and descriptions, and the collector is able to +identify conflicts against the set of attributes that make up an Entity. + +The merge rules defined here give precedent for the collector to generically interact with "type", "identifying attributes" and "descriptive attributes" +rather than hard-coded rules that have to understand the nuance of when `host.id` influences `host.name`, e.g. ### Users are expected to provide / prioritize "detectors" and determine which entity is "producing" or most-important for a signal From 5bc612056551b7d2387491dc39f2a60a2dce37b8 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Fri, 13 Sep 2024 13:30:14 -0400 Subject: [PATCH 39/51] Rename Resource Coordinator to Resource Provider. --- text/entities/0264-resource-and-entities.md | 28 ++++++++++----------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 6948ee7ec..8347ef1a5 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -11,7 +11,7 @@ It is an expansion on the [previous entity proposal](0256-entities-data-model.md - [Motivation](#motivation) - [Design](#design) * [Approach - Resource Improvements](#approach---resource-improvements) - + [Resource Coordinator](#resource-coordinator) + + [Resource Provider](#resource-coordinator) + [Entity Detector](#entity-detector) + [Entity Merging and Resource](#entity-merging-and-resource) + [Environment Variable Detector](#environment-variable-detector) @@ -75,17 +75,17 @@ We define the following SDK components: - **Resource Detectors (legacy)**: We preserve existing resource detectors. They have the same behavior and interfaces as today. - **Entity Detectors (new)**: Detecting an entity that is relevant to the current instance of the SDK. For example, this would detect a service entity for the current SDK, or its process. Every entity must have some relation to the current SDK. -- **Resource Coordinator (new)**: A component responsible for taking Resource and Entity detectors and doing the following: +- **Resource Provider (new)**: A component responsible for taking Resource and Entity detectors and doing the following: - Constructing a Resource for the SDK from detectors. - Dealing with conflicts between detectors. - Providing SDK-internal access to detected Resources for reporting via Log signal on configured LogProviders. - *(new) Managing Entity changes during SDK lifetime, specifically dealing with entities that have lifetimes shorter than the SDK* -#### Resource Coordinator +#### Resource Provider -The SDK Resource Coordinator is responsible for running all configured Resource and Entity Detectors. There will be some (user-controlled, otel default) priority order to these. +The SDK Resource Provider is responsible for running all configured Resource and Entity Detectors. There will be some (user-controlled, otel default) priority order to these. -- The Resource Coordinator will detect conflicts in Entity of the same type being discovered and choose one to use. +- The Resource Provider will detect conflicts in Entity of the same type being discovered and choose one to use. - When using Entity Detectors and Resource detectors together, the following merge rules will be used: - Entity merging will occur first resulting in an "Entity Merged" Resource (See [algorithm here](#entity-merging-and-resource)). - Resource detectors otherwise follow existing merge semantics. @@ -93,7 +93,7 @@ The SDK Resource Coordinator is responsible for running all configured Resource - Specifically: This means the [rules around merging Resource across schema-url will be dropped](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md#merge). Instead only conflicting attributes will be dropped. - SchemaURL on Resource will be deprecated with entity-specific schema-url replacing it. SDKs will only fill out SchemaURL on Resource when SchemaURL matches across all entities discovered. Additionally, only existing stable resource attributes can be used in Resource SchemaURL in stable OpenTelemetry components (Specifially `service.*` and `sdk.*` are the only stabilized resource convnetions). Given prevalent concerns of implementations around Resource merge specification, we suspect impact of this deprecation to be minimal, and existing usage was within the "experimental" phase of semantic conventions. - An OOTB ["Env Variable Entity Detector"](#environment-variable-detector) will be specified and provided vs. requiring SDK wide ENV variables for resource detection. -- *Additionally, Resource Coordinator would be responsible for understanding Entity lifecycle events, for Entities whose lifetimes do not match or exceed the SDK's own lifetime (e.g. browser session).* +- *Additionally, Resource Provider would be responsible for understanding Entity lifecycle events, for Entities whose lifetimes do not match or exceed the SDK's own lifetime (e.g. browser session).* #### Entity Detector @@ -153,7 +153,7 @@ The minimum requirements of this entity detector are: - ENV variable(s) can specify multiple entities (resource attribute bundles) - ENV variable(s) can be easily appended or leverages by multiple participating systems, if needed. -- Entities discovered via ENV variable(s) can participate in Resource Coordinator generically, i.e. resolving conflicting definitions. +- Entities discovered via ENV variable(s) can participate in Resource Provider generically, i.e. resolving conflicting definitions. - ENV variable(s) have a priority that can be influenced by platform entity providers (e.g. prepending vs. appending) The actual design for this ENV variable interaction would follow the approval of this OTEP. @@ -185,7 +185,7 @@ processor: The list of detectors is given in priority order (first wins, in event of a tie, outside of override configuration). The processor may need to be updated to allow the override flag to apply to each individual detector. -The rules for attributes would follow entity merging rules, as defined for the SDK resource manager. +The rules for attributes would follow entity merging rules, as defined for the SDK resource proivder. Note: While this proposals shows a new processor replacing the `resourcedetection` processor, the details of whether to modify-in-place the existing `resourcedetection` processor or create a new one would be determined as a follow up to this design. Ideally, we don't want users to need new configuration for resource in the otel collector. @@ -298,7 +298,7 @@ The third option prevents generic code from interacting with Resource and Entity ### How to deal with Resource/Entities whose lifecycle does not match the SDK? -This proposal motivates a Resource Coordinator in the SDK whose job could include managing changes in entity lifetimes, but does not account for how these changes would be broadcast across TracerProvider, LogProvider, MeterProvider, etc. That would be addressed in a follow on OTEP. +This proposal motivates a Resource Provider in the SDK whose job could include managing changes in entity lifetimes, but does not account for how these changes would be broadcast across TracerProvider, LogProvider, MeterProvider, etc. That would be addressed in a follow on OTEP. ### How to deal with Prometheus Compatibility for non-SDK telemetry? @@ -375,7 +375,7 @@ entity detectors: ```mermaid flowchart LR SDK["`**SDK**`"] -->|OTLP| BACKEND["`**Backend**`"] - SDK -.- RC((Resource Coordinator)) + SDK -.- RC((Resource Provider)) RC -.- OTEL_DETECTOR((OpenTelemetry Default Resource Detection)) RC -.- GCP_DETECTOR((Google Cloud Specific Resource Detection)) GCP_DETECTOR -. Detects .-> GCE{{gcp.gce}} @@ -407,7 +407,7 @@ and a Collector: flowchart LR SDK["`**SDK**`"] -->|OTLP| COLLECTOR["`**Collector**`"] COLLECTOR -->|OTLP| BACKEND["`**Backend**`"] - SDK -.- RC((Resource Coordinator)) + SDK -.- RC((Resource Provider)) COLLECTOR -.- RP((Resource Processor)) RP -. Detects .-> EC2{{aws.ec2}} RP -. Detects .-> HOST{{host}} @@ -434,7 +434,7 @@ Let's consider the interaction of resource, entity where both the SDK and the Co flowchart LR SDK["`**SDK**`"] -->|OTLP| COLLECTOR["`**Collector**`"] COLLECTOR -->|OTLP| BACKEND["`**Backend**`"] - SDK -.- RC((Resource Coordinator)) + SDK -.- RC((Resource Provider)) COLLECTOR -.- RP((Resource Processor)) RP -. Detects .-> HOST2{{host}} RC -. Detects .-> HOST{{host}} @@ -465,7 +465,7 @@ Let's consider the interaction of resource, entity where there is an identity co flowchart LR SDK["`**SDK**`"] -->|OTLP| COLLECTOR["`**Collector**`"] COLLECTOR -->|OTLP| BACKEND["`**Backend**`"] - SDK -.- RC((Resource Coordinator)) + SDK -.- RC((Resource Provider)) COLLECTOR -.- RP((Resource Processor)) RP -. Detects .-> HOST2{{host 2}} RC -. Detects .-> HOST{{host 1}} @@ -506,7 +506,7 @@ can occur between components within the system. flowchart LR SDK["`**SDK**`"] -->|OTLP| COLLECTOR["`**Collector**`"] COLLECTOR -->|OTLP| BACKEND["`**Backend**`"] - SDK -.- RC((Resource Coordinator)) + SDK -.- RC((Resource Provider)) COLLECTOR -.- RP((Resource Processor)) RP -. Detects .-> POD{{"`k8s.pod *schema: 1.26.0* From f3395fbaac20cdb1f1078f770c17171591002fd3 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Mon, 16 Sep 2024 09:47:43 -0400 Subject: [PATCH 40/51] Fixes from review. --- text/entities/0264-resource-and-entities.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 8347ef1a5..9285db95c 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -338,8 +338,8 @@ While we expect the collector to be the first component to start engaging with E ### What about advanced entity interaction in the Collector? -On problem that motivated this design is the issue of "local resource detection" vs. "remote signal collection" in the OpenTelemetry collector. That is, I have a process running on a machine writing to an OpenTelemetry -collector running on a different machine. The current `resourcedetectoinprocessor` in the collector appends attributes to resource based on discovering *where the collector is running*. However, +One problem that motivated this design is the issue of "local resource detection" vs. "remote signal collection" in the OpenTelemetry collector. That is, I have a process running on a machine writing to an OpenTelemetry +collector running on a different machine. The current `resourcedetectionprocessor` in the collector appends attributes to resource based on discovering *where the collector is running*. However, as the collector could determine that telemetry has come from a different machine, it could also avoid adding resource attributes that are not relevant to incoming data. Today, `resourcedetectionprocessor` is naive, as is the algorithm proposed in this OTEP. We believe that a more sophisticated solution could be created where the collector would know not to join entities onto a @@ -357,11 +357,11 @@ Below is a brief discussion of some design decisions: - **Only associating one entity with a Resource.** This was rejected, as too high a friction point in evolving semantic conventions and allowing independent systems to coordinate identity + entities within the OpenTelemetry ecosystem. Eventually, this would force OpenTelemetry to model all possibly entities in the world and understand their interaction or otherwise prevent non-OpenTelemetry instrumentation from interacting with OpenTelemetry entities. - **Embed fully Entity in Resource.** This was rejected because it makes it easy/trivial for Resource attributes and Entities to diverge. This would prevent the backwards/forwards compatibility goals and also require all participating OTLP users to leverage entities. Entity should be an opt-in / additional feature that may or may not be engaged with, depending on user need. -- **Re-using resource detectoin as-is** This was reject as not having a viable compatibility path forward. Creating a new set of components that can preserve existing behavior while allowing users to adopt the new functionality means that users have better control of when they see / change system behavior, and adoption is more obvious across the ecosystem. +- **Re-using resource detection as-is** This was rejected as not having a viable compatibility path forward. Creating a new set of components that can preserve existing behavior while allowing users to adopt the new functionality means that users have better control of when they see / change system behavior, and adoption is more obvious across the ecosystem. ## Future Posibilities -This proposal opens the door for addressing issues where an Entity's lifetime does not match an SDK's lifetime, in addition to providing a data model where mutable (descriptive) attributes can be changed over the lifetime of a resource without affecting its idnetity. We expect a follow-on OTEP which directly handles this issue. +This proposal opens the door for addressing issues where an Entity's lifetime does not match an SDK's lifetime, in addition to providing a data model where mutable (descriptive) attributes can be changed over the lifetime of a resource without affecting its identity. We expect a follow-on OTEP which directly handles this issue. ## Use Cases @@ -385,7 +385,7 @@ flowchart LR OTEL_DETECTOR -. Detects .-> SERVICE{{service}} ``` -Here, there is a services running on the Google Compute Engine. The user +Here, there is a service running on Google Compute Engine. The user has configured a Google Cloud specific set of entity detectors. Both the built in OpenTelemetry detection and the configured Google Cloud detection discover a `host` entity. @@ -394,7 +394,7 @@ The following outcome would occur: - The resulting resource would have all of the following entities: `host`, `process`, `service` and `gcp.gce` - The user-configured resource detector would take priority over built in: the `host` defined from the Google Cloud detection would "win" and be included in resource. - - This means `host.id` e.g. could be the id discovered for GCE VMs. Similarly for other cloud provider detection, like Amazon EC2 where VMs are given a unique ID by the Cloud Provider, rather than a generic machinne ID, e.g. + - This means `host.id` e.g. could be the id discovered for GCE VMs. Similarly for other cloud provider detection, like Amazon EC2 where VMs are given a unique ID by the Cloud Provider, rather than a generic machine ID, e.g. - This matches existing behavior/expectations today for AWS, GCP, etc. on what `host.id` would mean. - Users would be able to configure which host wins, by swapping the priority order of "default" vs. cloud-specific detection. @@ -416,7 +416,7 @@ flowchart LR ``` Here, an SDK is running on Amazon EC2. it is configured with resource detection -that find a `process` and `service` entity. The SDK is sending data to an +that finds a `process` and `service` entity. The SDK is sending data to an OpenTelemetry Collector that has a resource processor configured to detect the `ec2` and `host` entities. @@ -441,7 +441,7 @@ flowchart LR RC -. Detects .-> SERVICE{{service}} ``` -Here, and SDK is running on a machine (physical or virtual). The SDK is +Here, an SDK is running on a machine (physical or virtual). The SDK is configured to detect the host it is running on. The collector is also running on a machine (physical or virtual). Both the SDK and the Collector detect a `host` entity (with the same identity). From 792124aed5029627e9c44247f493cd9b54edbdba Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 17 Sep 2024 07:46:33 -0400 Subject: [PATCH 41/51] Fix typo. --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 9285db95c..1cfba5f2b 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -130,7 +130,7 @@ We provide a simple algorithm for this behavior: - If the entity identity is different: drop the new entity `d'`. - Otherwise, add the entity `d'` to set `E` - Construct a Resource from the set `E`. - - If all entities within `E` have the same `schema_url`, se the resource's + - If all entities within `E` have the same `schema_url`, set the resource's `schema_url` to match. - Otherwise, leave the Resource `schema_url` blank. From 818d666ae35341967ea7d8617e2c82daf48f1968 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 17 Sep 2024 08:18:58 -0400 Subject: [PATCH 42/51] Add call out about offline usage. --- text/entities/0264-resource-and-entities.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 1cfba5f2b..e45f39744 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -349,6 +349,21 @@ resource based on more advanced knowledge of the communication protocol used to The design proposed here attempts to balance non-breaking (backwards and forwards compatible) changes with the need to improve problematic issues in the Specification. Given the inability of most SDKs to implement the current Resource merge specification, breaking this should have little effect on actual users. Instead, the proposed merge specification should allow implementations to match current behavior and expectation, while evolving for users who engage with the new model. +### Why don't we download schema url contents? + +OpenTelemetry needs to work in environments that have no/limited access to the external internet. We entertained, and +dismissed merging solutions that *require* access to the contents of `schema_url` to work. While the core algorithm +*cannot require* this access, we *should* be able to provide improved processing and algorithms that may leverage this data. + +For example: + +- Within an SDK, we can registry entity schema information with `EntityDetector`. +- The OpenTelemetry Collector can allow registered `schema_url` via configuraton + or (optionally) download schema on demand. + +This design does not prevent these solutions, but provides the baseline/fallback +where `schema_url` is not accessible and entities must still be usable. + ## Prior art and alternatives Previously, we have a few unaccepted oteps, e.g. ([OTEP 208](https://github.com/open-telemetry/oteps/pull/208)). Additionally, there are some alternatives that were considered in the Entities WG and rejected. From 5793f366d4bf0eb4e41b4ac87e5f28f7c7eb1570 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 17 Sep 2024 08:24:35 -0400 Subject: [PATCH 43/51] Add resource identity section. --- text/entities/0264-resource-and-entities.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index e45f39744..4f6acf270 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -221,6 +221,19 @@ The entityref data model, would have the following changes from the original [en | identifying_attributes_keys | repeated string | Attribute Keys that identify the entity.
MUST not change during the lifetime of the entity. The Id must contain at least one attribute.

These keys MUST exists in Resource.attributes.

Follows OpenTelemetry common attribute definition. SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | | descriptive_attributes_keys | repeated string | Descriptive (non-identifying) attribute keys of the entity.
MAY change over the lifetime of the entity. MAY be empty. These attribute keys are not part of entity's identity.

These keys MUST exist in Resource.attributes.

Follows any value definition in the OpenTelemetry spec - it can be a scalar value, byte array, an array or map of values. Arbitrary deep nesting of values for arrays and maps is allowed.

SHOULD follow OpenTelemetry semantic conventions for attributes.| now a reference | +### Resource Identity + +OpenTelemetry resource identity will be modified as follows: + +- When `entities` is empty on resource, then its identity is the collection + of all `attributes` (both key and values). +- When `entities` is non-empty on resource, then its identity is the collection + of all `attributes` where the key is found in `entities.identify_attribute_keys`. + +When grouping or mixing OTLP data, you can detect if two resources are the same +using its identity and merge descriptive attributes (if applicable) using the +entity merge algorithm (described above) which will be formalized in the data model. + ## How this proposal solves the problems that motivated it Let's look at some motivating problems from the [Entities Proposal](https://docs.google.com/document/d/1VUdBRInLEhO_0ABAoiLEssB1CQO_IcD5zDnaMEha42w/edit#heading=h.atg5m85uw9w8): From a55d61e2128ea4d06153781a5513a0ec02598080 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 17 Sep 2024 08:27:57 -0400 Subject: [PATCH 44/51] Regenerate TOC. --- text/entities/0264-resource-and-entities.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 4f6acf270..8fe9a3dad 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -11,7 +11,7 @@ It is an expansion on the [previous entity proposal](0256-entities-data-model.md - [Motivation](#motivation) - [Design](#design) * [Approach - Resource Improvements](#approach---resource-improvements) - + [Resource Provider](#resource-coordinator) + + [Resource Provider](#resource-provider) + [Entity Detector](#entity-detector) + [Entity Merging and Resource](#entity-merging-and-resource) + [Environment Variable Detector](#environment-variable-detector) @@ -19,6 +19,7 @@ It is an expansion on the [previous entity proposal](0256-entities-data-model.md - [Datamodel Changes](#datamodel-changes) * [Resource](#resource) * [ResourceEntityRef](#resourceentityref) + * [Resource Identity](#resource-identity) - [How this proposal solves the problems that motivated it](#how-this-proposal-solves-the-problems-that-motivated-it) * [Problem 1: Commingling of Entities](#problem-1-commingling-of-entities) * [Problem 2: Lack of Precise Identity](#problem-2-lack-of-precise-identity) @@ -41,6 +42,7 @@ It is an expansion on the [previous entity proposal](0256-entities-data-model.md * [What happens if existing Resource translation in the collector remove resource attributes an Entity relies on?](#what-happens-if-existing-resource-translation-in-the-collector-remove-resource-attributes-an-entity-relies-on) * [What about advanced entity interaction in the Collector?](#what-about-advanced-entity-interaction-in-the-collector) - [Trade-offs and mitigations](#trade-offs-and-mitigations) + * [Why don't we download schema url contents?](#why-dont-we-download-schema-url-contents) - [Prior art and alternatives](#prior-art-and-alternatives) - [Future Posibilities](#future-posibilities) - [Use Cases](#use-cases) From 408862a3a63067616b799a0990cf9e29934f3bc4 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 17 Sep 2024 10:06:16 -0400 Subject: [PATCH 45/51] Fix spelling error. --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 8fe9a3dad..a6a1cea5c 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -373,7 +373,7 @@ dismissed merging solutions that *require* access to the contents of `schema_url For example: - Within an SDK, we can registry entity schema information with `EntityDetector`. -- The OpenTelemetry Collector can allow registered `schema_url` via configuraton +- The OpenTelemetry Collector can allow registered `schema_url` via configuration or (optionally) download schema on demand. This design does not prevent these solutions, but provides the baseline/fallback From 4010a3330bc56f9ad5e26fd69802bf10818182e5 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 17 Sep 2024 14:50:46 -0400 Subject: [PATCH 46/51] Add some more options for Prometheus compatibility. --- text/entities/0264-resource-and-entities.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index a6a1cea5c..f0f4209b7 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -319,11 +319,22 @@ This proposal motivates a Resource Provider in the SDK whose job could include m Today, [Prometheus compatibility](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/compatibility/prometheus_and_openmetrics.md) relies on two key attributes in Resource: `service.name` and `service.instance.id`. These are not guaranteed to exist outside of OpenTelemetry SDK generation. While this question is not fully answered, we believe outlining identity in all resources within OpenTelemetry allows us to define a solution in the future while preserving compatibility with what works today. +Here's a list of requirements for the solution: + +- Existing prometheus/OpenTelemetry users should be able to migrate from where they are today. +- Prometheus should have a set of identifying attributes for their up and coming `info()` function and info metric features. +- (desired) OpenTelemetry should be able to create unique `job`/`instance` labels between all metrics sent to prometheus for any "info" metric join. + A quick proposal of what this might look like: - `target_info` metric generation is updated to exclude any keys which are contained in `descriptive_attributes_keys` of an entity. - For each entity which has non-empty descriptive_attributes_keys, generate an info metric: `_entity_info` (naming TBD), which has all identifying and descriptive keys. This should play nicely with the planned improvements to [info-typed metrics](https://github.com/prometheus/proposals/blob/main/proposals/2024-04-10-native-support-for-info-metrics-metadata.md#goals). +Another option (that would break compatibility): + +- `job`/`instance` are synthesized using a consistent hashing algorithm and identifying attributes of entity on resource. +- Each entity type is written as an info metric called `_entity-info`. + ### Should entities have a domain? Is it worth having a `domain` in addition to type for entity? We could force each entity to exist in one domain and leverage domain generically in resource management. Entity Detectors would be responsible for an entire domain, selecting only ONE to apply a resource. Domains could be layered, e.g. a Cloud-specific domain may layer on top of a Kubernetes domain, where "GKE cluster entity" identifies *which* kubernetes cluster a kuberntes infra entity is part of. This layer would be done naively, via automatic join of participating entities or explicit relationships derived from GKE specific hooks. From 0ee5dc8998c207d8d115a80120a83d97f03568f6 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Tue, 17 Sep 2024 14:51:42 -0400 Subject: [PATCH 47/51] Fix spelling issue. --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index f0f4209b7..9ec9eb1b4 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -814,7 +814,7 @@ The OTEL operator for k8s provides the following via ENV variables: - `k8s.container.*` - `service.*` -### What could this mean for chosing entities that belong on resource? +### What could this mean for choosing entities that belong on resource? Let's look at an example of a container running in kubernetes, specifically EKS. From 7c9e5297ffad102983e1f36196ae630776f9be38 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Thu, 19 Sep 2024 09:02:01 -0400 Subject: [PATCH 48/51] Update prometheus proposals based on discussions. --- text/entities/0264-resource-and-entities.md | 35 ++++++++++++++------- 1 file changed, 23 insertions(+), 12 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 9ec9eb1b4..b6a63c354 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -322,18 +322,29 @@ Today, [Prometheus compatibility](https://github.com/open-telemetry/opentelemetr Here's a list of requirements for the solution: - Existing prometheus/OpenTelemetry users should be able to migrate from where they are today. -- Prometheus should have a set of identifying attributes for their up and coming `info()` function and info metric features. -- (desired) OpenTelemetry should be able to create unique `job`/`instance` labels between all metrics sent to prometheus for any "info" metric join. - -A quick proposal of what this might look like: - -- `target_info` metric generation is updated to exclude any keys which are contained in `descriptive_attributes_keys` of an entity. -- For each entity which has non-empty descriptive_attributes_keys, generate an info metric: `_entity_info` (naming TBD), which has all identifying and descriptive keys. This should play nicely with the planned improvements to [info-typed metrics](https://github.com/prometheus/proposals/blob/main/proposals/2024-04-10-native-support-for-info-metrics-metadata.md#goals). - -Another option (that would break compatibility): - -- `job`/`instance` are synthesized using a consistent hashing algorithm and identifying attributes of entity on resource. -- Each entity type is written as an info metric called `_entity-info`. +- Any solution MUST work with the [info-typed metrics](https://github.com/prometheus/proposals/blob/main/proposals/2024-04-10-native-support-for-info-metrics-metadata.md#goals) being added in prometheus. + - Resource descriptive attributes should leverage `info()` + - Resource identifying attributes need more thought/design. + - Note: Current `info()` design will only work with `target_info` metric, and `job/instance` labels for joins. This labels MUST be generated by the OTLP endpoint in prometheus. +- (desired) Users should be able to correlate metric timeseries to other signals via Resource attributes showing up as labels in prometheus. +- (desired) Conversion from `OTLP -> prometheus` can be reversed such that `OTLP -> Prometheus -> OTLP` is non-lossy. + +Here's a few (non-exhaustive) options for what this could look like: + +- Option #1 - Stay as close to today as possible + - `target_info` continues to exist as it is, with all resource attributes. + - OpenTelemetry syntehsizes unique `job`/`instance` labels from identifying attributes in Resource in lieu of `service` entity. + - Prometheus OTLP ingestion continues to support promoting resource attributes to metric labels. +- Option #2 - Promote all identifying attributes + - By default all identifying labels on Resource are promoted to resource attributes. + - All descriptive labels are placed on `target_info`. + - (likely) `job`/`instance` will need to be synthesized for resources lacking a `service` entity. +- Option #3 - Enocde entities into prometheus as info metrics + - Create `{entity_type}_entity_info` metrics. + - Synthesize `job`/`instance` labels for joins between all `*_info` metrics. + - Expand the scope of info-typed metrics work in Prometheus to work with this encoding. + +These designs will be explored and evaluated in light of the requirements. ### Should entities have a domain? From 36234ff8ec377cf54aafb21325cbdb62e55dc920 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Thu, 19 Sep 2024 12:00:11 -0400 Subject: [PATCH 49/51] Update proposal with result of discussions. --- text/entities/0264-resource-and-entities.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index b6a63c354..9d6aac1cc 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -323,17 +323,16 @@ Here's a list of requirements for the solution: - Existing prometheus/OpenTelemetry users should be able to migrate from where they are today. - Any solution MUST work with the [info-typed metrics](https://github.com/prometheus/proposals/blob/main/proposals/2024-04-10-native-support-for-info-metrics-metadata.md#goals) being added in prometheus. - - Resource descriptive attributes should leverage `info()` - - Resource identifying attributes need more thought/design. + - Resource descriptive attributes should leverage `info()` or metadata. + - Resource identifying attributes need more thought/design from OpenTelemetry semconv + entities WG. - Note: Current `info()` design will only work with `target_info` metric, and `job/instance` labels for joins. This labels MUST be generated by the OTLP endpoint in prometheus. - (desired) Users should be able to correlate metric timeseries to other signals via Resource attributes showing up as labels in prometheus. - (desired) Conversion from `OTLP -> prometheus` can be reversed such that `OTLP -> Prometheus -> OTLP` is non-lossy. Here's a few (non-exhaustive) options for what this could look like: -- Option #1 - Stay as close to today as possible +- Option #1 - Stay with what we have today - `target_info` continues to exist as it is, with all resource attributes. - - OpenTelemetry syntehsizes unique `job`/`instance` labels from identifying attributes in Resource in lieu of `service` entity. - Prometheus OTLP ingestion continues to support promoting resource attributes to metric labels. - Option #2 - Promote all identifying attributes - By default all identifying labels on Resource are promoted to resource attributes. @@ -343,8 +342,9 @@ Here's a few (non-exhaustive) options for what this could look like: - Create `{entity_type}_entity_info` metrics. - Synthesize `job`/`instance` labels for joins between all `*_info` metrics. - Expand the scope of info-typed metrics work in Prometheus to work with this encoding. +- Option #4 - Find solutions leveraging the [metadata design](https://docs.google.com/document/d/1epBslSSwRO2do4armx40fruStJy_PS6thROnPeDifz8/edit#heading=h.5sybau7waq2q) -These designs will be explored and evaluated in light of the requirements. +These designs will be explored and evaluated in light of the requirements. For now, prometheus compatibility will continue with Option #1 as we work together towards building a better future for resource in prometheus. ### Should entities have a domain? From 67f93896c7610b6f45eec26bff9986878a44404f Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Thu, 19 Sep 2024 14:41:23 -0400 Subject: [PATCH 50/51] Update text/entities/0264-resource-and-entities.md Co-authored-by: Arve Knudsen --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index 9d6aac1cc..cd59e2667 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -325,7 +325,7 @@ Here's a list of requirements for the solution: - Any solution MUST work with the [info-typed metrics](https://github.com/prometheus/proposals/blob/main/proposals/2024-04-10-native-support-for-info-metrics-metadata.md#goals) being added in prometheus. - Resource descriptive attributes should leverage `info()` or metadata. - Resource identifying attributes need more thought/design from OpenTelemetry semconv + entities WG. - - Note: Current `info()` design will only work with `target_info` metric, and `job/instance` labels for joins. This labels MUST be generated by the OTLP endpoint in prometheus. + - Note: Current `info()` design will only work with `target_info` metric by default (other info metrics can be specified per `info` call), and `job/instance` labels for joins. These labels MUST be generated by the OTLP endpoint in prometheus. - (desired) Users should be able to correlate metric timeseries to other signals via Resource attributes showing up as labels in prometheus. - (desired) Conversion from `OTLP -> prometheus` can be reversed such that `OTLP -> Prometheus -> OTLP` is non-lossy. From 48b951781e6aa69022131eb8e809e35334b11972 Mon Sep 17 00:00:00 2001 From: Josh Suereth Date: Fri, 20 Sep 2024 13:50:32 -0400 Subject: [PATCH 51/51] Update text/entities/0264-resource-and-entities.md Co-authored-by: David Ashpole --- text/entities/0264-resource-and-entities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/entities/0264-resource-and-entities.md b/text/entities/0264-resource-and-entities.md index cd59e2667..6ba498d5e 100644 --- a/text/entities/0264-resource-and-entities.md +++ b/text/entities/0264-resource-and-entities.md @@ -230,7 +230,7 @@ OpenTelemetry resource identity will be modified as follows: - When `entities` is empty on resource, then its identity is the collection of all `attributes` (both key and values). - When `entities` is non-empty on resource, then its identity is the collection - of all `attributes` where the key is found in `entities.identify_attribute_keys`. + of all `attributes` where the key is not found in `entities.descriptive_attributes_keys`. When grouping or mixing OTLP data, you can detect if two resources are the same using its identity and merge descriptive attributes (if applicable) using the