-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OTLP Trace Data Format specification #59
Add OTLP Trace Data Format specification #59
Conversation
f913f99
to
db39943
Compare
This requires a dedicated PR against the specification repo. |
Or a dedicated OTEP, with a title like "Change SetAttribute to record all values instead of just the last one." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious about one of the stated requirements of OTLP, namely the augmenting and partial inspection. How are those achieved with proto format?
Ideally it must also support very fast pass-through mode (when no modifications to the data are needed), fast “augmenting” or “tagging” of data and partial inspection of data (e.g. check for presence of specific tag). These requirements help to create fast Agents and Collectors.
77f4b41
to
190ef87
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... at the same time changes the semantic of attributes ...
This requires a dedicated PR against the specification repo.
I think this part of my PR description is poorly worded. The attribute semantic is defined by the API/Specification. The SDK Exporter is then tasked with translating the semantics to a correct implementation when implementing a particular protocol. OTLP and this PR allows to have multiple attributes of the same value but nothing requires that the semantics are different from what is defined by OpenTelemetry API (the implementations are free to enforce uniqueness of the attributes).
I am not proposing a semantic change of the API in this PR and agree such a change definitely requires a dedicated PR.
There is no in-principle improvement in this area compared to OpenCensus. However because encoding and decoding are significantly faster one can argue that we made progress in this area. I had more ambitious goals when I wrote the requirement and it was the reason I explored Capnproto and other encodings but ended up not going in that direction due to reasons outlined in #35. I think this goal still holds and we can come up with a better protocol. What we have here is a good starting point for the first release though. |
If you still think those additional goals are achievable in the future, we may want to list them in the otep as still-to-be-met. |
@yurishkuro I added a note. |
I changed the status to approved, so that this gets merged in approved status (when it gets merged). If I understand correctly this is the right flow for RFCs. |
Actually, in the current flow you can remove the "Status" field entirely in the OTEP and PR merged=accepted. |
This change applies the refinement approach that is already performed on Traces Protobuf definitions as part of open-telemetry/oteps#59 and which proved to yield significant performance improvements. Notable changes are: - Replace google.protobuf.Timestamp by int64 time in unix epoch nanoseconds. - Eliminate unnecessary messages, move the fields into the containing message. - Replace oneof by a set of fields. Simple benchmark in Go demonstrates the following improvement compared to OpenCensus Metrics encoding and decoding: ``` BenchmarkEncode/OpenCensus/Metrics-8 2 645252504 ns/op BenchmarkEncode/OTLP/Metrics-8 4 288457433 ns/op BenchmarkDecode/OpenCensus/Metrics-8 1 1154650804 ns/op BenchmarkDecode/OTLP/Metrics-8 3 475370617 ns/op ``` Encoding is about 2.2 times faster, decoding is about 2.4 times faster. Benchmarks encode and decode 500 batches of 2 metrics: one int64 Gauge with 5 time series and one Histogram of doubles with 1 time series and single bucket. Each time series for both metrics contains 5 data points. Both metrics have 2 labels. Benchmark source code is available at: https://github.com/tigrannajaryan/exp-otelproto/blob/master/encodings/encoding_test.go
7b17ac3
to
ad0381e
Compare
This change applies the refinement approach that is already performed on Traces Protobuf definitions as part of open-telemetry/oteps#59 and which proved to yield significant performance improvements. I replaced google.protobuf.Timestamp by int64 time in unix epoch nanoseconds. Simple benchmark in Go demonstrates the following improvement of encoding and decoding compared to the current state: ``` ===== Encoded sizes Encoding Uncompressed Improved Compressed Improved Baseline/MetricOne 20000 bytes [1.000], gziped 1506 bytes [1.000] Proposed/MetricOne 18250 bytes [1.096], gziped 1433 bytes [1.051] Encoding Uncompressed Improved Compressed Improved Baseline/MetricSeries 51797 bytes [1.000], gziped 6455 bytes [1.000] Proposed/MetricSeries 43047 bytes [1.203], gziped 6093 bytes [1.059] goos: darwin goarch: amd64 pkg: github.com/tigrannajaryan/exp-otelproto/encodings BenchmarkEncode/Baseline/MetricOne-8 30 186998840 ns/op BenchmarkEncode/Proposed/MetricOne-8 36 166668705 ns/op BenchmarkEncode/Baseline/MetricSeries-8 8 632391842 ns/op BenchmarkEncode/Proposed/MetricSeries-8 10 537384515 ns/op BenchmarkDecode/Baseline/MetricOne-8 16 348156010 ns/op 171896049 B/op 4974000 allocs/op BenchmarkDecode/Proposed/MetricOne-8 19 314727259 ns/op 155096036 B/op 4624000 allocs/op BenchmarkDecode/Baseline/MetricSeries-8 5 1013035422 ns/op 440696048 B/op 11874000 allocs/op BenchmarkDecode/Proposed/MetricSeries-8 6 846887981 ns/op 356696040 B/op 10124000 allocs/op ``` It is 10-15% faster and is 10-20% smaller on the wire and in memory. Benchmarks encode and decode 500 batches of 2 metrics: one int64 Gauge with 5 time series and one Histogram of doubles with 1 time series and single bucket. Each time series for both metrics contains either 1 data point (MetricOne) or 5 data points (MetricSeries). Both metrics have 2 labels. Benchmark source code is available at: https://github.com/tigrannajaryan/exp-otelproto/blob/master/encodings/encoding_test.go
#33) * Change timestamp data type to unixnano in Metrics Protobuf definitions This change applies the refinement approach that is already performed on Traces Protobuf definitions as part of open-telemetry/oteps#59 and which proved to yield significant performance improvements. I replaced google.protobuf.Timestamp by int64 time in unix epoch nanoseconds. Simple benchmark in Go demonstrates the following improvement of encoding and decoding compared to the current state: ``` ===== Encoded sizes Encoding Uncompressed Improved Compressed Improved Baseline/MetricOne 20000 bytes [1.000], gziped 1506 bytes [1.000] Proposed/MetricOne 18250 bytes [1.096], gziped 1433 bytes [1.051] Encoding Uncompressed Improved Compressed Improved Baseline/MetricSeries 51797 bytes [1.000], gziped 6455 bytes [1.000] Proposed/MetricSeries 43047 bytes [1.203], gziped 6093 bytes [1.059] goos: darwin goarch: amd64 pkg: github.com/tigrannajaryan/exp-otelproto/encodings BenchmarkEncode/Baseline/MetricOne-8 30 186998840 ns/op BenchmarkEncode/Proposed/MetricOne-8 36 166668705 ns/op BenchmarkEncode/Baseline/MetricSeries-8 8 632391842 ns/op BenchmarkEncode/Proposed/MetricSeries-8 10 537384515 ns/op BenchmarkDecode/Baseline/MetricOne-8 16 348156010 ns/op 171896049 B/op 4974000 allocs/op BenchmarkDecode/Proposed/MetricOne-8 19 314727259 ns/op 155096036 B/op 4624000 allocs/op BenchmarkDecode/Baseline/MetricSeries-8 5 1013035422 ns/op 440696048 B/op 11874000 allocs/op BenchmarkDecode/Proposed/MetricSeries-8 6 846887981 ns/op 356696040 B/op 10124000 allocs/op ``` It is 10-15% faster and is 10-20% smaller on the wire and in memory. Benchmarks encode and decode 500 batches of 2 metrics: one int64 Gauge with 5 time series and one Histogram of doubles with 1 time series and single bucket. Each time series for both metrics contains either 1 data point (MetricOne) or 5 data points (MetricSeries). Both metrics have 2 labels. Benchmark source code is available at: https://github.com/tigrannajaryan/exp-otelproto/blob/master/encodings/encoding_test.go * Change timestamp from int64 to sfixed64 * Change timestamp interval from closed to open
I think this discussion from the other repository is relevant: open-telemetry/opentelemetry-proto#11 (comment) |
@tmc adding support for nested attribute values is easy. As soon as API/Spec decides to have such semantics the |
@open-telemetry/specs-approvers please review. I believe there are no more outstanding issues and this is ready to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving as I think it's ok to merge, but raised a couple questions.
@open-telemetry/specs-approvers do I need more approvals to get this merged? There are no outstanding issues or comments in this PR anymore. I believe it is ready to be merged. |
@bogdandrutu thank you. |
@tigrannajaryan It seems you are currently working on applying the OTEP bit by bit to opentelemetry-proto. Is there an overview somewhere which parts are applied and which are missing? |
@Oberon00 yes, I am doing it now. No, there is no overview, I don't have a formal list of changes anywhere, however the target is well defined, if there is an interest on how much progress is made you can diff OTEP and proto repo content. |
This brings the improved comments from open-telemetry/oteps#59
This brings the improved comments from open-telemetry/oteps#59
* Add OTLP Trace Data Format specification This is a continuation of OTLP RFC proposal open-telemetry#35 This change defines the data format used by Span and Resource messages. The data format is a result of research of prior art (primarily OpenCensus and Jaeger), as well as experimentation and benchmarking done as part of OTLP RFC proposal. Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto (use `make benchmark-encoding` target). Benchmarking shows that depending on the payload composition this data format is about 4x-5x faster in encoding and 2x-3x faster in decoding equivalent data compared to OpenCensus data format (all benchmarks in Go). Notable differences from OpenCensus: - Attribute key/value pairs are represented as a list rather than as a map. This results in significant performance gains and at the same time changes the semantic of attributes because now it is possible to have multiple attributes with the same key. This is also in-line with Jaeger's tags representation. - Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in significant performance improvements for certain payload compositions (e.g. lots of TimedEvents). - Resource labels use the same data type as Span attributes which now allows to have labels with other data types (OpenCensus only allowed strings). * Address review comments * Add protobuf type qualifier to pre-formatted blocks * Add a note about future goals for the protocol * Address review comments * Make sure all field comments start with field name * Change status to approved * Clarify start and end time expectations. * Address Sergey's comments * Remove string length requirement * Add StatusCode enum
* Add OTLP Trace Data Format specification This is a continuation of OTLP RFC proposal open-telemetry#35 This change defines the data format used by Span and Resource messages. The data format is a result of research of prior art (primarily OpenCensus and Jaeger), as well as experimentation and benchmarking done as part of OTLP RFC proposal. Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto (use `make benchmark-encoding` target). Benchmarking shows that depending on the payload composition this data format is about 4x-5x faster in encoding and 2x-3x faster in decoding equivalent data compared to OpenCensus data format (all benchmarks in Go). Notable differences from OpenCensus: - Attribute key/value pairs are represented as a list rather than as a map. This results in significant performance gains and at the same time changes the semantic of attributes because now it is possible to have multiple attributes with the same key. This is also in-line with Jaeger's tags representation. - Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in significant performance improvements for certain payload compositions (e.g. lots of TimedEvents). - Resource labels use the same data type as Span attributes which now allows to have labels with other data types (OpenCensus only allowed strings). * Address review comments * Add protobuf type qualifier to pre-formatted blocks * Add a note about future goals for the protocol * Address review comments * Make sure all field comments start with field name * Change status to approved * Clarify start and end time expectations. * Address Sergey's comments * Remove string length requirement * Add StatusCode enum
* Add OTLP Trace Data Format specification This is a continuation of OTLP RFC proposal open-telemetry#35 This change defines the data format used by Span and Resource messages. The data format is a result of research of prior art (primarily OpenCensus and Jaeger), as well as experimentation and benchmarking done as part of OTLP RFC proposal. Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto (use `make benchmark-encoding` target). Benchmarking shows that depending on the payload composition this data format is about 4x-5x faster in encoding and 2x-3x faster in decoding equivalent data compared to OpenCensus data format (all benchmarks in Go). Notable differences from OpenCensus: - Attribute key/value pairs are represented as a list rather than as a map. This results in significant performance gains and at the same time changes the semantic of attributes because now it is possible to have multiple attributes with the same key. This is also in-line with Jaeger's tags representation. - Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in significant performance improvements for certain payload compositions (e.g. lots of TimedEvents). - Resource labels use the same data type as Span attributes which now allows to have labels with other data types (OpenCensus only allowed strings). * Address review comments * Add protobuf type qualifier to pre-formatted blocks * Add a note about future goals for the protocol * Address review comments * Make sure all field comments start with field name * Change status to approved * Clarify start and end time expectations. * Address Sergey's comments * Remove string length requirement * Add StatusCode enum
This is a continuation of OTLP RFC proposal #35
This change defines the data format used by Span and Resource messages.
The data format is a result of research of prior art (primarily OpenCensus and Jaeger),
as well as experimentation and benchmarking done as part of OTLP RFC proposal.
Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto
(use
make benchmark-encoding
target). Benchmarking shows that depending on the payloadcomposition this data format is about 4x-5x faster in encoding and 2x-3x faster in
decoding equivalent data compared to OpenCensus data format (all benchmarks in Go).
Notable differences from OpenCensus:
Attribute key/value pairs are represented as a list rather than as a map.
This results in significant performance gains and at the same time
changesmakes possible to have multiple attributesthe semantic of attributes because now it is
with the same key. This is also in-line with Jaeger's tags representation.
Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in
significant performance improvements for certain payload compositions (e.g. lots of
TimedEvents).
Resource labels use the same data type as Span attributes which now allows
to have labels with other data types (OpenCensus only allowed strings).