New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

RFC - Pipeline Component Telemetry #11406

Open

djaglowski wants to merge 13 commits into open-telemetry:main from djaglowski:component-telemetry-rfc

+212 −0

Member

djaglowski commented Oct 9, 2024 •

edited by mx-psi

Loading

This PR adds a RFC for normalized telemetry across all pipeline components. See #11343

edit by @mx-psi:

Announced on #otel-collector-dev on 2024-10-23: https://cloud-native.slack.com/archives/C07CCCMRXBK/p1729705290741179
Announced on the Collector SIG meeting from 2024-10-30

djaglowski mentioned this pull request

Auto-instrumented pipeline components #11343

Closed

codecov bot commented Oct 9, 2024 •

edited

Loading

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.56%. Comparing base (68f0264) to head (7925012).
Report is 100 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #11406      +/-   ##
==========================================
- Coverage   92.15%   91.56%   -0.60%     
==========================================
  Files         432      441       +9     
  Lines       20291    23856    +3565     
==========================================
+ Hits        18700    21844    +3144     
- Misses       1228     1640     +412     
- Partials      363      372       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


          RFC - Auto-instrumentation of pipeline components

5df52e1

djaglowski force-pushed the component-telemetry-rfc branch from 99e3086 to 5df52e1 Compare

October 10, 2024 13:05


          Merge branch 'main' into component-telemetry-rfc

djaglowski marked this pull request as ready for review

October 10, 2024 13:36

djaglowski requested a review from a team as a code owner

October 10, 2024 13:36

djaglowski requested a review from songy23

October 10, 2024 13:36

djaglowski added Skip Changelog Skip Contrib Tests labels

codeboten reviewed

View reviewed changes

Contributor

codeboten left a comment

Thanks for opening this as a RFC @djaglowski!

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

djaglowski and others added 2 commits

October 10, 2024 11:45


          Update docs/rfcs/component-universal-telemetry.md

d0f1637

Co-authored-by: Alex Boten <[email protected]>


          Feedback

bea0e2f

jmacd mentioned this pull request

WIP: pipeline monitoring otep open-telemetry/oteps#259

Closed

bogdandrutu reviewed

View reviewed changes

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved


          Feedback

35c82a4

bogdandrutu reviewed

View reviewed changes

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

bogdandrutu reviewed

View reviewed changes

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved


          Broaden scope and convert to evolving consensus

djaglowski changed the title ~~RFC - Auto-instrumentation of pipeline components~~ RFC - Pipeline Component Telemetry

Member Author

djaglowski commented Oct 16, 2024

Based on some offline feedback, I've broadened the scope of the RFC, while simultaneously clarifying that it is intended to evolve as we identify additional standards.


          Update names to consumed and produced

b1fd90c

jaronoff97 reviewed

View reviewed changes

Contributor

jaronoff97 left a comment

a few questions, I really like this proposal overall :)

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Show resolved Hide resolved

djaglowski added 2 commits

October 23, 2024 14:08


          Change proposed metric names to use '.' instead of '_'

90d19ab


          Separate metrics by component kind

497587c

dmathieu reviewed

View reviewed changes

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved


          Add profiles as attribute value

Co-authored-by: Damien Mathieu <[email protected]>

jaronoff97 reviewed

View reviewed changes

docs/rfcs/component-universal-telemetry.md Show resolved Hide resolved

mx-psi approved these changes

View reviewed changes

wildum reviewed

View reviewed changes

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

wildum reviewed

View reviewed changes

docs/rfcs/component-universal-telemetry.md Show resolved Hide resolved

wildum approved these changes

View reviewed changes


          Update docs/rfcs/component-universal-telemetry.md

1b26ae2

Co-authored-by: William Dumont <[email protected]>

jaronoff97 approved these changes

View reviewed changes

evan-bradley approved these changes

View reviewed changes

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

docs/rfcs/component-universal-telemetry.md Outdated Show resolved Hide resolved

jpkrohling reviewed

View reviewed changes

docs/rfcs/component-universal-telemetry.md

+              - `otel.output.signal`: `logs`, `metrics` `traces`, `profiles`
+              Note: The `otel.signal`, `otel.output.signal`, or `otel.pipeline.id` attributes may be omitted if the corresponding component instances
+              are unified by the component implementation. For example, the `otlp` receiver is a singleton, so its telemetry is not specific to a signal.

Member

jpkrohling Nov 4, 2024

I think this needs normative language. Setting the otel.signal for the OTLP receiver for its bootstrap operations are certainly misleading to people operating the collector who are unaware of the inner working of this specific component (ie, that it's a singleton).

So:

attributes MUST be omitted if the corresponding component instances

Member Author

djaglowski Nov 4, 2024

This is a tricky topic and I'm not sure we can be so strict. It certainly makes sense to me that e.g. logs generated while initializing the singleton should not be attributed to one signal or pipeline. However, we can still attribute metrics to a particular signal (e.g. if otlp receiver emits 10 logs and 20 metrics, do you want a count of "30 items" or "10 logs" and "20 metrics". Maybe this is a good argument for splitting the proposed metrics by signal type, e.g. produced_metrics, produced_logs, etc. This would allow those metrics to share the same set of attributes with other signals produced by the instance.

docs/rfcs/component-universal-telemetry.md


		All signals should use the following attributes:

		### Receivers

Member

jpkrohling Nov 4, 2024

I know this states "pipeline component telemetry", and that extensions aren't technically part of a pipeline, but it feels wrong to leave them out: otel.component.kind and .id could definitely apply to them as well.

Member Author

djaglowski Nov 4, 2024

The scope of this effort has been increased a lot already. Can we leave extensions for another proposal? Personally I don't feel I have enough expertise with extensions to author such details.

docs/rfcs/component-universal-telemetry.md

+. A count of "items" (spans, data points, or log records). These are low cost but broadly useful, so they should be enabled by default.
+. A measure of size, based on [ProtoMarshaler.Sizer()](https://github.com/open-telemetry/opentelemetry-collector/blob/9907ba50df0d5853c34d2962cf21da42e15a560d/pdata/ptrace/pb.go#L11).
+                These are high cost to compute, so by default they should be disabled (and not calculated).

Member

jpkrohling Nov 4, 2024

How costly? I remember talking to someone about this in the past, and they mentioned that it's not that expensive, given that it just delegates to what already exists in protobuf:

opentelemetry-collector/pdata/ptrace/pb.go

Line 22 in 9907ba5

return pb.Size()

It would be nice to have benchmarks to have data backing this (or the other) claim. I would definitely see as very useful to have a histogram of item/batch sizes and having it as optional means that people might only find out about it when they'd benefit from having historical data in the first place.

Member Author

djaglowski Nov 4, 2024

Suggested change

      
              These are high cost to compute, so by default they should be disabled (and not calculated).
          
              These may be high cost to compute, so by default they should be disabled (and not calculated). This default setting may change in the future if it is demonstrated that the cost is generally acceptable.

How's this wording?

docs/rfcs/component-universal-telemetry.md

+              Metrics provide most of the observability we need but there are some gaps which logs can fill. Although metrics would describe the overall
+              item counts, it is helpful in some cases to record more granular events. e.g. If a produced batch of 10,000 spans results in an error, but
+batches of 100 spans succeed, this may be a matter of batch size that can be detected by analyzing logs, while the corresponding metric

Member

jpkrohling Nov 4, 2024

This is clearly a tracing case for me :-) The rule of thumb to me is: is the information related to a particular transaction? Then it should go into a span.

Member Author

djaglowski Nov 4, 2024

Makes sense if we agree that we should capture a span for the consume function.

docs/rfcs/component-universal-telemetry.md


		### Auto-Instrumented Spans

		It is not clear that any spans can be captured automatically with the proposed mechanism. We have the ability to insert instrumentation both

Member

jpkrohling Nov 4, 2024

Context is passed down, isn't it? We can definitely instrument the ingress part of the component, and ask components to add span links if they are messing with the context. This way, the trace for a pipeline with a batch processor would end at the batching processor, but a new trace with a span link would be created pointing to the originating batch request.

Member Author

djaglowski Nov 4, 2024

I see, so instead of closing the span when data comes out, we close it when the consume func returns?

I think the duration of the span would be meaningful only for some synchronous processors, and could be meaningful for syncronous connectors (e.g. if they create and link spans to represent the work associated with the incoming data). But what about asynchronous components? Do we accept that the span is just measuring a quick handoff to the internal state of the component? Is this going to be misleading to users?

Member

jpkrohling commented Nov 4, 2024

Some of my comments might have been discussed before, in which case, feel free to ignore me and just mark the items as resolved.

djaglowski and others added 2 commits

November 4, 2024 11:15


          Update docs/rfcs/component-universal-telemetry.md

bba26c9

Co-authored-by: Evan Bradley <[email protected]>


          Change 'otel.output.signal' to 'otel.signal.output'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

dmathieu dmathieu left review comments

jpkrohling jpkrohling left review comments

codeboten codeboten left review comments

bogdandrutu bogdandrutu left review comments

andrzej-stencel andrzej-stencel left review comments

jade-guiton-dd jade-guiton-dd left review comments

mx-psi mx-psi approved these changes

jaronoff97 jaronoff97 approved these changes

evan-bradley evan-bradley approved these changes

wildum wildum approved these changes

songy23 Awaiting requested review from songy23 songy23 is a code owner automatically assigned from open-telemetry/collector-approvers

Labels

Skip Changelog Skip Contrib Tests