diff --git a/.chloggen/service-instance-id.yaml b/.chloggen/service-instance-id.yaml new file mode 100644 index 0000000000..34acc9cf03 --- /dev/null +++ b/.chloggen/service-instance-id.yaml @@ -0,0 +1,4 @@ +change_type: 'enhancement' +component: resource +note: Define a common algorithm for `service.instance.id`. +issues: [312] diff --git a/docs/resource/README.md b/docs/resource/README.md index c67b8f3517..5dd529d264 100644 --- a/docs/resource/README.md +++ b/docs/resource/README.md @@ -99,10 +99,35 @@ as specified in the [Resource SDK specification](https://github.com/open-telemet | Attribute | Type | Description | Examples | Requirement Level | |---|---|---|---|---| -| `service.instance.id` | string | The string ID of the service instance. [1] | `my-k8s-pod-deployment-1`; `627cc493-f310-47de-96bd-71410b7dec09` | Recommended | +| `service.instance.id` | string | The string ID of the service instance. [1] | `627cc493-f310-47de-96bd-71410b7dec09` | Recommended | | `service.namespace` | string | A namespace for `service.name`. [2] | `Shop` | Recommended | -**[1]:** MUST be unique for each instance of the same `service.namespace,service.name` pair (in other words `service.namespace,service.name,service.instance.id` triplet MUST be globally unique). The ID helps to distinguish instances of the same service that exist at the same time (e.g. instances of a horizontally scaled service). It is preferable for the ID to be persistent and stay the same for the lifetime of the service instance, however it is acceptable that the ID is ephemeral and changes during important lifetime events for the service (e.g. service restarts). If the service has no inherent unique ID that can be used as the value of this attribute it is recommended to generate a random Version 1 or Version 4 RFC 4122 UUID (services aiming for reproducible UUIDs may also use Version 5, see RFC 4122 for more recommendations). +**[1]:** MUST be unique for each instance of the same `service.namespace,service.name` pair (in other words +`service.namespace,service.name,service.instance.id` triplet MUST be globally unique). The ID helps to +distinguish instances of the same service that exist at the same time (e.g. instances of a horizontally scaled +service). + +Implementations, such as SDKs, are recommended to generate a random Version 1 or Version 4 [RFC +4122](https://www.ietf.org/rfc/rfc4122.txt) UUID, but are free to use an inherent unique ID as the source of +this value if stability is desirable. In that case, the ID SHOULD be used as source of a UUID Version 5 and +SHOULD use the following UUID as the namespace: `4d63009a-8d0f-11ee-aad7-4c796ed8e320`. + +UUIDs are typically recommended, as only an opaque value for the purposes of identifying a service instance is +needed. Similar to what can be seen in the man page for the +[`/etc/machine-id`](https://www.freedesktop.org/software/systemd/man/machine-id.html) file, the underlying +data, such as pod name and namespace should be treated as confidential, being the user's choice to expose it +or not via another resource attribute. + +For applications running behind an application server (like unicorn), we do not recommend using one identifier +for all processes participating in the application. Instead, it's recommended each division (e.g. a worker +thread in unicorn) to have its own instance.id. + +It's not recommended for a Collector to set `service.instance.id` if it can't unambiguously determine the +service instance that is generating that telemetry. For instance, creating an UUID based on `pod.name` will +likely be wrong, as the Collector might not know from which container within that pod the telemetry originated. +However, Collectors can set the `service.instance.id` if they can unambiguously determine the service instance +for that telemetry. This is typically the case for scraping receivers, as they know the target address and +port. **[2]:** A string value having a meaning that helps to distinguish a group of services, for example the team name that owns a group of services. `service.name` is expected to be unique within the same namespace. If `service.namespace` is not specified in the Resource then `service.name` is expected to be unique for all services that have no explicit namespace defined (so the empty/unspecified namespace is simply one more valid namespace). Zero-length namespace string is assumed equal to unspecified namespace. diff --git a/model/resource/service_experimental.yaml b/model/resource/service_experimental.yaml index 43c869ee35..99ed64f024 100644 --- a/model/resource/service_experimental.yaml +++ b/model/resource/service_experimental.yaml @@ -22,16 +22,31 @@ groups: type: string brief: > The string ID of the service instance. - note: > - MUST be unique for each instance of the same `service.namespace,service.name` pair - (in other words `service.namespace,service.name,service.instance.id` triplet MUST be globally unique). - The ID helps to distinguish instances of the same service that exist at the same time - (e.g. instances of a horizontally scaled service). It is preferable for the ID to be persistent - and stay the same for the lifetime of the service instance, however it is acceptable that - the ID is ephemeral and changes during important lifetime events for the service - (e.g. service restarts). - If the service has no inherent unique ID that can be used as the value of this attribute - it is recommended to generate a random Version 1 or Version 4 RFC 4122 UUID - (services aiming for reproducible UUIDs may also use Version 5, see RFC 4122 - for more recommendations). - examples: ["my-k8s-pod-deployment-1", "627cc493-f310-47de-96bd-71410b7dec09"] + note: | + MUST be unique for each instance of the same `service.namespace,service.name` pair (in other words + `service.namespace,service.name,service.instance.id` triplet MUST be globally unique). The ID helps to + distinguish instances of the same service that exist at the same time (e.g. instances of a horizontally scaled + service). + + Implementations, such as SDKs, are recommended to generate a random Version 1 or Version 4 [RFC + 4122](https://www.ietf.org/rfc/rfc4122.txt) UUID, but are free to use an inherent unique ID as the source of + this value if stability is desirable. In that case, the ID SHOULD be used as source of a UUID Version 5 and + SHOULD use the following UUID as the namespace: `4d63009a-8d0f-11ee-aad7-4c796ed8e320`. + + UUIDs are typically recommended, as only an opaque value for the purposes of identifying a service instance is + needed. Similar to what can be seen in the man page for the + [`/etc/machine-id`](https://www.freedesktop.org/software/systemd/man/machine-id.html) file, the underlying + data, such as pod name and namespace should be treated as confidential, being the user's choice to expose it + or not via another resource attribute. + + For applications running behind an application server (like unicorn), we do not recommend using one identifier + for all processes participating in the application. Instead, it's recommended each division (e.g. a worker + thread in unicorn) to have its own instance.id. + + It's not recommended for a Collector to set `service.instance.id` if it can't unambiguously determine the + service instance that is generating that telemetry. For instance, creating an UUID based on `pod.name` will + likely be wrong, as the Collector might not know from which container within that pod the telemetry originated. + However, Collectors can set the `service.instance.id` if they can unambiguously determine the service instance + for that telemetry. This is typically the case for scraping receivers, as they know the target address and + port. + examples: ["627cc493-f310-47de-96bd-71410b7dec09"]