Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify host.name attribute and host registry definition #1647

Open
rogercoll opened this issue Dec 4, 2024 · 2 comments
Open

Clarify host.name attribute and host registry definition #1647

rogercoll opened this issue Dec 4, 2024 · 2 comments

Comments

@rogercoll
Copy link
Contributor

Area(s)

area:host

Is your change request related to a problem? Please describe.

The current semantic conventions' documentation for the “host” registry is defined as:

A host is defined as a computing instance. For example, physical servers, virtual machines, switches or disk array.

Reference: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/attributes-registry/host.md

Based on this definition, a Linux/K8s container (cgroup) is not explicitly considered a host. However, this raises an important question: isn’t a cgroup an isolated instance with its own dedicated computing resources? What truly distinguishes a virtual machine from a container?

If containers are not classified as hosts but as services running within a host, it brings up another critical question: what should the value of the host.name attribute be when retrieved by SDKs operating inside a container? Should this attribute even be populated in such cases?

Currently, the Go SDK utilizes the os.Hostname function from the internal package to retrieve and populate the host.name resource attribute. Meaning that a Go SDK container will report as host.name the container’s hostname, not the hostname value of the virtual machine it is running on. If a container is not considered a host, should SDKs running on containers report this value?
A workaround to fulfill the host’s semantic convention description, would be to send all container’s signals to a collector which overrides/sets the host.name value with the actual virtual machine hostname the container is running on. The following diagram shows an architecture that overrides all container's host.name value with the k8s.node.name value, which normally corresponds to the virtual machine's hostname:

image

The previous is a possible user interpretation of the host.name value to simplify service correlation—ensuring all service.name instances sharing the same host.name are identified as running on the same machine— but, it comes with notable downsides:

  • Requires of an additional processor (either on-site or on the backend) to override the host.name value. In addition, it requires the ability to gather the value of the host.name in the upper virtualization layer. The latest is feasible to achieve if there is a service like the OpenTelemetry collector that has resource detectors to do so. But what would happen for standalone agents running in a container and directly sending OTLP data to another node’s OTLP endpoint or even a remote endpoint?

  • The hostname within a container/cgroup describes useful information: “Containers within the Pod see the system hostname as being the same as the configured name for the Pod.”
    The container’s hostname is used for container’s communication, either in K8s or standalone Docker deployment.

Describe the solution you'd like

Proposal: Include container in the host definition + new host.hostname attribute

Modify the host semantic conventions registry definition to explicitly include containers. This adjustment will narrow the scope of the definition, reducing ambiguity for containerized environments.

The current description of host.name is highly permissive:

“Name of the host. On Unix systems, it may contain what the hostname command returns, or the fully qualified hostname, or another name specified by the user.” - Reference

This flexibility leads to a lack of determinism in the value of host.name. In some cases, it reflects the value returned by the gethostname system call, while in others, it may represent a user-defined custom value. However, in distributed systems, the actual service networking hostname plays a critical role in enabling reliable entity and signal correlation.

To address this issue and align with established conventions, such as those in Elastic ECS, a new attribute could be introduced to explicitly reference the hostname used for networking communications. This proposal differentiates between the network hostname and the general host name as follows:

  • host.hostname: The hostname of the host as used for networking communications. Typically, this is the value returned by the hostname command on the host machine.

  • host.name: The name of the host. This attribute may contain what the hostname command returns on Unix systems, the fully qualified domain name (FQDN), or a name specified by the user. The recommended value is the lowercase FQDN of the host.

Describe alternatives you've considered

Proposal: Define the value of host.name dependent on the environment

To ensure consistency across different deployment scenarios, we could define the value of host.name in a way that depends on the environment in which the service is running. Suggested guidelines for various scenarios are as follows:

  • Virtual Machine Deployments: In virtual machine environments, the value of host.name for any running service should be equal to the virtual machine's fully qualified domain name (FQDN).
    • Cloud instances: TBD (instance-id? FQDN hostname?)
  • For Docker deployments, container’s host.name should correspond to the FQDN of the host machine is running on, not the hostname of the service's container. Services (SDKs) should have a resource detector in place to retrieve the “host” hostname, similar to the OpenTelemetry Collector docker resource detector: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/internal/docker/documentation.md
  • In Kubernetes environments, the approach should mirror that for Docker: either don’t provide the host.name value if SDKs don’t have a “host” resource detector (which will need an outer enricher) or be equal to the “host” FQDN hostname (or even the k8s.node.name value).

This second proposal complements the first by further emphasizing the need to distinguish between the actual networking hostname of the computing instance and the value of host.name as defined by semantic conventions.

Additional context

No response

@ChrsMark
Copy link
Member

ChrsMark commented Dec 5, 2024

In Kubernetes environments, the approach should mirror that for Docker: either don’t provide the host.name value if SDKs don’t have a “host” resource detector (which will need an outer enricher) or be equal to the “host” FQDN hostname (or even the k8s.node.name value).

I had shared some related confusion at #761 (comment)

I think it would help in general if we could discuss some specific examples here to understand the various cases and validate any decision against them.

@rogercoll
Copy link
Contributor Author

host.hostname removal reference open-telemetry/opentelemetry-specification#787

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants