Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add agent semantic conventions #950

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tigrannajaryan
Copy link
Member

@tigrannajaryan tigrannajaryan commented Apr 23, 2024

This is a draft that I am opening to start discussion about agent semantic conventions. We will need input from semconv SIG and from Collector SIG.

Resolves #396
Contributes to open-telemetry/opamp-spec#131
Alternate to #575 which did not get traction.

Changes

We need a way to record more information about agents than is currently possible using existing semantic conventions. Otel Collector in particular today uses service.name,service.instance.id,service.version attributes to report its own telemetry. These are useful but not sufficient, particularly we are missing the information about which distribution of Otel Collector it is.

The proposed agent.type,agent.version,agent.id conventions are also aligned with ECS: https://www.elastic.co/guide/en/ecs/current/ecs-agent.html

With introduction of this conventions the following attributes change in Otel Collector's own telemetry output:

service.name -> agent.type
service.version -> agent.version
service.instance.id -> agent.id

agent.distro is added as one more property, the equivalent of which did not exist in the past.

Usage

Case 1. Own Telemetry.

The agents will use these attributes in the Resource of the telemetry they about themselves.

Case 2. OpAMP.

These will be used as identifying attributes of the agent in OpAMP.

Merge requirement checklist

I will add changelog/schema after we agree to move forward with this proposal.

@tigrannajaryan
Copy link
Member Author

@open-telemetry/collector-approvers @open-telemetry/collector-contrib-approvers @open-telemetry/opamp-spec-approvers @open-telemetry/opamp-go-approvers Please review.

Resolves open-telemetry#396
Contributes to open-telemetry/opamp-spec#131

We need a way to record more information about agents than is currently
possible using existing semantic conventions. Otel Collector in particular
today uses service.name,service.instance.id,service.version attributes
to report its own telemetry. These are useful but not sufficient, particularly
we are missing the information about which distribution of Otel Collector
it is.

agent.type/agent.version/agent.id conventions are also aligned with ECS:
https://www.elastic.co/guide/en/ecs/current/ecs-agent.html

With introduction of this conventions the following attributes change in
Otel Collector's own telemetry output:

service.name -> agent.type
service.version -> agent.version
service.instance.id -> agent.id

agent.distro will be added as one more property, the equivalent of which
did not exist in the past.
@tigrannajaryan tigrannajaryan force-pushed the feature/tigran/agentconv branch from c79fca9 to 084df5d Compare April 23, 2024 17:45
docs/attributes-registry/agent.md Show resolved Hide resolved
<!-- semconv registry.agent(omit_requirement_level) -->
| Attribute | Type | Description | Examples | Stability |
|---|---|---|---|---|
| `agent.distro` | string | Agent distribution. [1] | `github.com/signalfx/splunk-otel-collector`; `github.com/aws-observability/aws-otel-collector` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to at least discuss what the value for this would be for core and contrib

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use the github repo values, e.g. github.com/open-telemetry/opentelemetry-collector-contrib for contrib. Do you think there are other, better options?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with those, in terms of implementation, would we add the distro value as a field to component.BuildInfo or something similar?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we restrict the number of examples ( columns)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with those, in terms of implementation, would we add the distro value as a field to component.BuildInfo or something similar?

Yes, I think so. That's where we put the short commands like "otelcol" today so we can add the corresponding distro string as another field of BuildInfo struct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we restrict the number of examples ( columns)?

@samiura I am not sure I understand. I have 2 examples for agent.distro currently. Please clarify what you would like to restrict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not resolving to wait on @samiura's comment, but consider my comment resolved here :)

docs/attributes-registry/agent.md Show resolved Hide resolved
| Attribute | Type | Description | Examples | Stability |
|---|---|---|---|---|
| `agent.distro` | string | Agent distribution. [1] | `github.com/signalfx/splunk-otel-collector`; `github.com/aws-observability/aws-otel-collector` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| `agent.id` | string | Unique identifier of agent instance. [2] | `627cc493-f310-47de-96bd-71410b7dec09` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference with service.instance.id if any? Is this persistent across restarts?

Copy link
Member

@AlexanderWert AlexanderWert Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference with service.instance.id if any?

In the case when telemetry is being collected for the OTel Collector itself there's maybe no difference.
However, the idea of agent.* fields in ECS is also to have the possibility attach that meta-information to telemetry data for a different, observed service.

For example, let's say the OTel collector collects metrics (or so) from an external service (let's say NGINX).
Then the telemetry data could contain:

  • service.name: NGINX
  • service.instance.id: 123
  • agent.type: io.opentelemetry.collector
  • agent.id: xyz

So with that we can differentiate the observer service from the observed service.

@tigrannajaryan not sure if this PR had also this kind of scenario in mind?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference with service.instance.id if any?

It is the agent's instance id. Conceptually the same thing.

Is this persistent across restarts?

Preferably it should, but I don't want to mandate that in semconv, since some agents may not have persistence capability. We can add recommendations.

So with that we can differentiate the observer service from the observed service.

It was not a goal for me, but appears useful. When recording both of these attribute sets you essentially allow answering the question "who collected this telemetry?". My one concern would be that it bloats all telemetry coming from the Collector, so I am not sure if we should do it.

@atoulme
Copy link
Contributor

atoulme commented Apr 24, 2024

Can you clarify if this should also encompass additional attributes, such as additional metadata related to the collector build or the OS the collector runs on, or is that out of scope and defined elsewhere?

@tigrannajaryan
Copy link
Member Author

Can you clarify if this should also encompass additional attributes, such as additional metadata related to the collector build or the OS the collector runs on, or is that out of scope and defined elsewhere?

@atoulme For things like OS I think we should use existing conventions, e.g. os.type, etc. That's what OpAMP recommends.

If there are other additional attributes that you would want to record for Collector we should either add them as more agent.* attributes if they are for agents only or suggest them as general conventions if they are applicable to other service types.

@tigrannajaryan
Copy link
Member Author

@open-telemetry/opamp-spec-approvers @open-telemetry/opamp-go-approvers please take a look and comment.

prefix: agent
type: resource
brief: >
An agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the risk of being verbose, could we give a sentence or two to describe how we define "agent" in this context? The docs seem to define agent as a deployment pattern, but I think these conventions are agnostic of a deployment model and use a different definition for the term.

I do see that this doesn't seem to appear anywhere in the readme output, so maybe there's no point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evan-bradley I am not sure how much do we want to narrow down the definition of the agent to be honest. We could say "data collection agent" like OpAMP does, if that's useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add agent resource type
6 participants