Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fallback for host.id in Cloud #576

Closed
wants to merge 2 commits into from

Conversation

christos68k
Copy link
Member

@christos68k christos68k commented Dec 1, 2023

This PR changes the semantics behind host.id calculation for Cloud.

Changes

In Cloud, it is possible for instance_id to not be retrievable from inside the instance (e.g. security concerns / restricted instance metadata). In this case, the host.id can be set from machine-id (same steps as non-containerized systems).

Without a fallback, the must wording has strong implications for clients and users in cases where the instance_id can not be automatically retrieved:

  1. Do not automatically populate host.id
  2. Urge the user to either provide instance_id or change the configuration to make it retrievable

A (breaking) alternative to the current approach is to always populate host.id from machine-id, and use a different field for instance_id (e.g. service.instance.id)

Merge requirement checklist

@christos68k christos68k requested review from a team December 1, 2023 18:29
christos68k added a commit to christos68k/semantic-conventions that referenced this pull request Dec 1, 2023
@AlexanderWert
Copy link
Member

AlexanderWert commented Dec 2, 2023

@christos68k please do the change in the corresponding yaml file:

The tables in markdown files are generated from the model yaml files. So once you did the change in the yaml file you need to run: make table-generation. This will automatically update the tables in the markdown files.

@AlexanderWert
Copy link
Member

@open-telemetry/semconv-system-approvers @open-telemetry/semconv-container-approvers PTAL

@@ -14,7 +14,7 @@
| `host.cpu.model.name` | string | Model designation of the processor. | `11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz` |
| `host.cpu.stepping` | int | Stepping or core revisions. | `1` |
| `host.cpu.vendor.id` | string | Processor manufacturer identifier. A maximum 12-character string. [1] | `GenuineIntel` |
| `host.id` | string | Unique host ID. For Cloud, this must be the instance_id assigned by the cloud provider. For non-containerized systems, this should be the `machine-id`. See the table below for the sources to use to determine the `machine-id` based on operating system. | `fdbf79e8af94cb7f9e8df36789187052` |
| `host.id` | string | Unique host ID. For Cloud, this must be the instance_id assigned by the cloud provider. If the instance_id can not be retrieved, this should be the `machine-id`. For non-containerized systems, this should be the `machine-id`. See the table below for the sources to use to determine the `machine-id` based on operating system. | `fdbf79e8af94cb7f9e8df36789187052` |
Copy link
Member

@mx-psi mx-psi Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overloading the meaning of host.id makes it unusable for some use cases. Right now, if cloud.platform is set to aws_ec2 and host.id is present, I know that host.id is the EC2 instance id. After this change, it would be impossible for me to know what host.id refers to. Knowing it is the EC2 instance id is very important for correlation across signals because it allows us to tell that the resource refers to the same machine.

Maybe the solution is to follow the steps of GCP which introduced gcp.gce.instance.name and have a similar one for each cloud provider, but until that happens I don't think we should accept this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] very important for correlation across signals because it allows us to tell that the resource refers to the same machine.

Exactly! And correlation is not possible at all if host.id cannot be retrieved and stays empty.
I think there should be an unambiguous algorithm (similar to #312) to derive the host.id in a way that providing a value is ALWAYS possible in order to enable correlation of signals. Things like the AWS EC2 instance id can be part of that, but assumptions like the following are implicit and therefore sort of unreliable:

Right now, if cloud.platform is set to aws_ec2 and host.id is present, I know that host.id is the EC2 instance id.

Wouldn't it be more appropriate to always have a host.id and set something like aws.ec2.instance_id when it is available?

Copy link
Contributor

@pyohannes pyohannes Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see main use cases for host.id as similar to service.instance.id, quoted from here:

The ID helps to distinguish instances of the same service that exist at the same time.

service.instance.id can be a K8s pod name, but that's not guaranteed. If I want that, I look at k8s.pod.name.

I'd advocate for a similar understanding of host.id, thus I'm in principle fine with the change proposed here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more appropriate to always have a host.id and set something like aws.ec2.instance_id when it is available?

I am fine with that, but that convention does not exist today. I would like to add these conventions first and only then change the semantics of host.id. I am happy to help listing what conventions that would entail based on existing implementations of detectors on different SDKs and the resource detection processor on the Collector

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to help listing what conventions that would entail based on existing implementations of detectors on different SDKs and the resource detection processor on the Collector

That'd be great! Thanks @mx-psi !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will file an issue for this, but here are the uses of host.id that I was able to find that are not machine-id:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christos68k @AlexanderWert PTAL at #600. I want to start with this and see what approvers and maintainers of semconv think

christos68k added a commit to christos68k/semantic-conventions that referenced this pull request Dec 4, 2023
christos68k added a commit to christos68k/semantic-conventions that referenced this pull request Dec 4, 2023
christos68k added a commit to christos68k/semantic-conventions that referenced this pull request Dec 4, 2023
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 25, 2024
@christos68k
Copy link
Member Author

Adding a comment to keep this PR from being closed.

@github-actions github-actions bot removed the Stale label Jan 30, 2024
@@ -14,7 +14,7 @@
| `host.cpu.model.name` | string | Model designation of the processor. | `11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz` |
| `host.cpu.stepping` | int | Stepping or core revisions. | `1` |
| `host.cpu.vendor.id` | string | Processor manufacturer identifier. A maximum 12-character string. [1] | `GenuineIntel` |
| `host.id` | string | Unique host ID. For Cloud, this must be the instance_id assigned by the cloud provider. For non-containerized systems, this should be the `machine-id`. See the table below for the sources to use to determine the `machine-id` based on operating system. | `fdbf79e8af94cb7f9e8df36789187052` |
| `host.id` | string | Unique host ID. For Cloud, this must be the instance_id assigned by the cloud provider. If the instance_id can not be retrieved, this should be the `machine-id`. For non-containerized systems, this should be the `machine-id`. See the table below for the sources to use to determine the `machine-id` based on operating system. | `fdbf79e8af94cb7f9e8df36789187052` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

If the instance_id can not be retrieved, this should be the `machine-id`. For non-containerized systems, this should be the `machine-id`.

Should be rewritten for better clarity.

If the instance_id can not be retrieved, or for non-containerized systems, this should be the `machine-id`.

christos68k added a commit to christos68k/semantic-conventions that referenced this pull request Feb 12, 2024
christos68k added a commit to christos68k/semantic-conventions that referenced this pull request Feb 12, 2024
Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #739 for the general issue I think needs to be solved before accepting this (see also the unresolved thread above)

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Copy link

github-actions bot commented Mar 8, 2024

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Mar 8, 2024
@christos68k
Copy link
Member Author

christos68k commented Mar 26, 2024

Could someone with rights (@mx-psi @AlexanderWert maybe?) to do so reopen this pull request?

@ChrsMark ChrsMark reopened this Mar 26, 2024
@github-actions github-actions bot removed the Stale label Mar 27, 2024
@ChrsMark
Copy link
Member

@christos68k I have reopened this one. You will need to update the changelog entry to follow the new format: https://github.com/open-telemetry/semantic-conventions/blob/main/CONTRIBUTING.md#adding-a-changelog-entry

christos68k added a commit to christos68k/semantic-conventions that referenced this pull request Apr 3, 2024
If the instance_id can not be retrieved, use machine-id value.
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

Copy link

github-actions bot commented May 8, 2024

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label May 8, 2024
@jsuereth jsuereth removed the Stale label May 13, 2024
@jsuereth
Copy link
Contributor

Given activity happening in Entity SIG - I think it's best to "wait" on major changes to resource related attributes. My own opinion on this Pr is that:

  1. Yes, using machine id makes sense as the default for host.id
  2. Having cloud-specific ids land in host.id seems problematic from a consistency/implementation standpoint. We need to define mechanisms where we would understand how a cloud-provider Resource detector would interact with a Host-specific resource detector. This change would force the ability for us to understand merge conflcts and overrides between these two disjoint components.

As such - (while I think machine id should be the default for host.id) I think we need to hold off on this PR until more progress has been made in Entity SiG - particularly the mapping between Resource <-> Entity and conflict resolution.

Copy link

github-actions bot commented Jun 1, 2024

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 1, 2024
Copy link

github-actions bot commented Jun 9, 2024

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants