Add OTLP Trace Data Format specification #59

tigrannajaryan · 2019-10-11T14:50:30Z

This is a continuation of OTLP RFC proposal #35

This change defines the data format used by Span and Resource messages.

The data format is a result of research of prior art (primarily OpenCensus and Jaeger),
as well as experimentation and benchmarking done as part of OTLP RFC proposal.

Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto
(use make benchmark-encoding target). Benchmarking shows that depending on the payload
composition this data format is about 4x-5x faster in encoding and 2x-3x faster in
decoding equivalent data compared to OpenCensus data format (all benchmarks in Go).

Notable differences from OpenCensus:

Attribute key/value pairs are represented as a list rather than as a map.
This results in significant performance gains and at the same time changes
the semantic of attributes because now it is makes possible to have multiple attributes
with the same key. This is also in-line with Jaeger's tags representation.
Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in
significant performance improvements for certain payload compositions (e.g. lots of
TimedEvents).
Resource labels use the same data type as Span attributes which now allows
to have labels with other data types (OpenCensus only allowed strings).

z1c0 · 2019-10-14T12:00:10Z

... at the same time changes the semantic of attributes ...

This requires a dedicated PR against the specification repo.

Oberon00 · 2019-10-14T12:39:59Z

Or a dedicated OTEP, with a title like "Change SetAttribute to record all values instead of just the last one."

yurishkuro

I am curious about one of the stated requirements of OTLP, namely the augmenting and partial inspection. How are those achieved with proto format?

Ideally it must also support very fast pass-through mode (when no modifications to the data are needed), fast “augmenting” or “tagging” of data and partial inspection of data (e.g. check for presence of specific tag). These requirements help to create fast Agents and Collectors.

text/0000-otlp-trace-data-format.md

text/0059-otlp-trace-data-format.md

tigrannajaryan

... at the same time changes the semantic of attributes ...

This requires a dedicated PR against the specification repo.

@z1c0 @Oberon00

I think this part of my PR description is poorly worded. The attribute semantic is defined by the API/Specification. The SDK Exporter is then tasked with translating the semantics to a correct implementation when implementing a particular protocol. OTLP and this PR allows to have multiple attributes of the same value but nothing requires that the semantics are different from what is defined by OpenTelemetry API (the implementations are free to enforce uniqueness of the attributes).

I am not proposing a semantic change of the API in this PR and agree such a change definitely requires a dedicated PR.

tigrannajaryan · 2019-10-22T00:02:56Z

I am curious about one of the stated requirements of OTLP, namely the augmenting and partial inspection. How are those achieved with proto format?

@yurishkuro

There is no in-principle improvement in this area compared to OpenCensus. However because encoding and decoding are significantly faster one can argue that we made progress in this area.

I had more ambitious goals when I wrote the requirement and it was the reason I explored Capnproto and other encodings but ended up not going in that direction due to reasons outlined in #35.

I think this goal still holds and we can come up with a better protocol. What we have here is a good starting point for the first release though.

yurishkuro · 2019-10-22T00:44:33Z

If you still think those additional goals are achievable in the future, we may want to list them in the otep as still-to-be-met.

tigrannajaryan · 2019-10-22T19:39:03Z

If you still think those additional goals are achievable in the future, we may want to list them in the otep as still-to-be-met.

@yurishkuro I added a note.

tigrannajaryan · 2019-10-23T03:12:08Z

I changed the status to approved, so that this gets merged in approved status (when it gets merged). If I understand correctly this is the right flow for RFCs.

Oberon00 · 2019-10-23T11:29:12Z

Actually, in the current flow you can remove the "Status" field entirely in the OTEP and PR merged=accepted.

This change applies the refinement approach that is already performed on Traces Protobuf definitions as part of open-telemetry/oteps#59 and which proved to yield significant performance improvements. Notable changes are: - Replace google.protobuf.Timestamp by int64 time in unix epoch nanoseconds. - Eliminate unnecessary messages, move the fields into the containing message. - Replace oneof by a set of fields. Simple benchmark in Go demonstrates the following improvement compared to OpenCensus Metrics encoding and decoding: ``` BenchmarkEncode/OpenCensus/Metrics-8 2 645252504 ns/op BenchmarkEncode/OTLP/Metrics-8 4 288457433 ns/op BenchmarkDecode/OpenCensus/Metrics-8 1 1154650804 ns/op BenchmarkDecode/OTLP/Metrics-8 3 475370617 ns/op ``` Encoding is about 2.2 times faster, decoding is about 2.4 times faster. Benchmarks encode and decode 500 batches of 2 metrics: one int64 Gauge with 5 time series and one Histogram of doubles with 1 time series and single bucket. Each time series for both metrics contains 5 data points. Both metrics have 2 labels. Benchmark source code is available at: https://github.com/tigrannajaryan/exp-otelproto/blob/master/encodings/encoding_test.go

text/0059-otlp-trace-data-format.md

This change applies the refinement approach that is already performed on Traces Protobuf definitions as part of open-telemetry/oteps#59 and which proved to yield significant performance improvements. I replaced google.protobuf.Timestamp by int64 time in unix epoch nanoseconds. Simple benchmark in Go demonstrates the following improvement of encoding and decoding compared to the current state: ``` ===== Encoded sizes Encoding Uncompressed Improved Compressed Improved Baseline/MetricOne 20000 bytes [1.000], gziped 1506 bytes [1.000] Proposed/MetricOne 18250 bytes [1.096], gziped 1433 bytes [1.051] Encoding Uncompressed Improved Compressed Improved Baseline/MetricSeries 51797 bytes [1.000], gziped 6455 bytes [1.000] Proposed/MetricSeries 43047 bytes [1.203], gziped 6093 bytes [1.059] goos: darwin goarch: amd64 pkg: github.com/tigrannajaryan/exp-otelproto/encodings BenchmarkEncode/Baseline/MetricOne-8 30 186998840 ns/op BenchmarkEncode/Proposed/MetricOne-8 36 166668705 ns/op BenchmarkEncode/Baseline/MetricSeries-8 8 632391842 ns/op BenchmarkEncode/Proposed/MetricSeries-8 10 537384515 ns/op BenchmarkDecode/Baseline/MetricOne-8 16 348156010 ns/op 171896049 B/op 4974000 allocs/op BenchmarkDecode/Proposed/MetricOne-8 19 314727259 ns/op 155096036 B/op 4624000 allocs/op BenchmarkDecode/Baseline/MetricSeries-8 5 1013035422 ns/op 440696048 B/op 11874000 allocs/op BenchmarkDecode/Proposed/MetricSeries-8 6 846887981 ns/op 356696040 B/op 10124000 allocs/op ``` It is 10-15% faster and is 10-20% smaller on the wire and in memory. Benchmarks encode and decode 500 batches of 2 metrics: one int64 Gauge with 5 time series and one Histogram of doubles with 1 time series and single bucket. Each time series for both metrics contains either 1 data point (MetricOne) or 5 data points (MetricSeries). Both metrics have 2 labels. Benchmark source code is available at: https://github.com/tigrannajaryan/exp-otelproto/blob/master/encodings/encoding_test.go

#33) * Change timestamp data type to unixnano in Metrics Protobuf definitions This change applies the refinement approach that is already performed on Traces Protobuf definitions as part of open-telemetry/oteps#59 and which proved to yield significant performance improvements. I replaced google.protobuf.Timestamp by int64 time in unix epoch nanoseconds. Simple benchmark in Go demonstrates the following improvement of encoding and decoding compared to the current state: ``` ===== Encoded sizes Encoding Uncompressed Improved Compressed Improved Baseline/MetricOne 20000 bytes [1.000], gziped 1506 bytes [1.000] Proposed/MetricOne 18250 bytes [1.096], gziped 1433 bytes [1.051] Encoding Uncompressed Improved Compressed Improved Baseline/MetricSeries 51797 bytes [1.000], gziped 6455 bytes [1.000] Proposed/MetricSeries 43047 bytes [1.203], gziped 6093 bytes [1.059] goos: darwin goarch: amd64 pkg: github.com/tigrannajaryan/exp-otelproto/encodings BenchmarkEncode/Baseline/MetricOne-8 30 186998840 ns/op BenchmarkEncode/Proposed/MetricOne-8 36 166668705 ns/op BenchmarkEncode/Baseline/MetricSeries-8 8 632391842 ns/op BenchmarkEncode/Proposed/MetricSeries-8 10 537384515 ns/op BenchmarkDecode/Baseline/MetricOne-8 16 348156010 ns/op 171896049 B/op 4974000 allocs/op BenchmarkDecode/Proposed/MetricOne-8 19 314727259 ns/op 155096036 B/op 4624000 allocs/op BenchmarkDecode/Baseline/MetricSeries-8 5 1013035422 ns/op 440696048 B/op 11874000 allocs/op BenchmarkDecode/Proposed/MetricSeries-8 6 846887981 ns/op 356696040 B/op 10124000 allocs/op ``` It is 10-15% faster and is 10-20% smaller on the wire and in memory. Benchmarks encode and decode 500 batches of 2 metrics: one int64 Gauge with 5 time series and one Histogram of doubles with 1 time series and single bucket. Each time series for both metrics contains either 1 data point (MetricOne) or 5 data points (MetricSeries). Both metrics have 2 labels. Benchmark source code is available at: https://github.com/tigrannajaryan/exp-otelproto/blob/master/encodings/encoding_test.go * Change timestamp from int64 to sfixed64 * Change timestamp interval from closed to open

tmc · 2019-11-01T01:41:15Z

I think this discussion from the other repository is relevant: open-telemetry/opentelemetry-proto#11 (comment)

tigrannajaryan · 2019-11-01T02:18:26Z

I think this discussion from the other repository is relevant: open-telemetry/opentelemetry-proto#11 (comment)

@tmc adding support for nested attribute values is easy. As soon as API/Spec decides to have such semantics the AttributeKeyValue in this proposal can be extended in backward compatible manner.

tigrannajaryan · 2019-11-04T16:50:48Z

@open-telemetry/specs-approvers please review. I believe there are no more outstanding issues and this is ready to be merged.

yurishkuro

I am approving as I think it's ok to merge, but raised a couple questions.

text/0059-otlp-trace-data-format.md

tigrannajaryan · 2019-11-06T14:51:52Z

@open-telemetry/specs-approvers do I need more approvals to get this merged? There are no outstanding issues or comments in this PR anymore. I believe it is ready to be merged.

tigrannajaryan · 2019-11-07T16:32:45Z

@bogdandrutu thank you.

Oberon00 · 2019-11-08T15:09:16Z

@tigrannajaryan It seems you are currently working on applying the OTEP bit by bit to opentelemetry-proto. Is there an overview somewhere which parts are applied and which are missing?

tigrannajaryan · 2019-11-08T15:13:07Z

@Oberon00 yes, I am doing it now. No, there is no overview, I don't have a formal list of changes anywhere, however the target is well defined, if there is an interest on how much progress is made you can diff OTEP and proto repo content.

This brings the improved comments from open-telemetry/oteps#59

* Add OTLP Trace Data Format specification This is a continuation of OTLP RFC proposal open-telemetry#35 This change defines the data format used by Span and Resource messages. The data format is a result of research of prior art (primarily OpenCensus and Jaeger), as well as experimentation and benchmarking done as part of OTLP RFC proposal. Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto (use `make benchmark-encoding` target). Benchmarking shows that depending on the payload composition this data format is about 4x-5x faster in encoding and 2x-3x faster in decoding equivalent data compared to OpenCensus data format (all benchmarks in Go). Notable differences from OpenCensus: - Attribute key/value pairs are represented as a list rather than as a map. This results in significant performance gains and at the same time changes the semantic of attributes because now it is possible to have multiple attributes with the same key. This is also in-line with Jaeger's tags representation. - Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in significant performance improvements for certain payload compositions (e.g. lots of TimedEvents). - Resource labels use the same data type as Span attributes which now allows to have labels with other data types (OpenCensus only allowed strings). * Address review comments * Add protobuf type qualifier to pre-formatted blocks * Add a note about future goals for the protocol * Address review comments * Make sure all field comments start with field name * Change status to approved * Clarify start and end time expectations. * Address Sergey's comments * Remove string length requirement * Add StatusCode enum

* Add OTLP Trace Data Format specification This is a continuation of OTLP RFC proposal open-telemetry/oteps#35 This change defines the data format used by Span and Resource messages. The data format is a result of research of prior art (primarily OpenCensus and Jaeger), as well as experimentation and benchmarking done as part of OTLP RFC proposal. Go benchmark source code is available at https://github.com/tigrannajaryan/exp-otelproto (use `make benchmark-encoding` target). Benchmarking shows that depending on the payload composition this data format is about 4x-5x faster in encoding and 2x-3x faster in decoding equivalent data compared to OpenCensus data format (all benchmarks in Go). Notable differences from OpenCensus: - Attribute key/value pairs are represented as a list rather than as a map. This results in significant performance gains and at the same time changes the semantic of attributes because now it is possible to have multiple attributes with the same key. This is also in-line with Jaeger's tags representation. - Removed unnecessary wrappers such as google.protobuf.Timestamp which resulted in significant performance improvements for certain payload compositions (e.g. lots of TimedEvents). - Resource labels use the same data type as Span attributes which now allows to have labels with other data types (OpenCensus only allowed strings). * Address review comments * Add protobuf type qualifier to pre-formatted blocks * Add a note about future goals for the protocol * Address review comments * Make sure all field comments start with field name * Change status to approved * Clarify start and end time expectations. * Address Sergey's comments * Remove string length requirement * Add StatusCode enum

tigrannajaryan requested review from AloisReitbauer, bogdandrutu, c24t, carlosalberto, iredelmeier, reyang, SergeyKanzhelev, songy23, tedsuo and yurishkuro as code owners October 11, 2019 14:50

tigrannajaryan force-pushed the feature/tigran/dataschema branch from f913f99 to db39943 Compare October 11, 2019 14:53

yurishkuro reviewed Oct 15, 2019

View reviewed changes

tigrannajaryan force-pushed the feature/tigran/dataschema branch from 77f4b41 to 190ef87 Compare October 18, 2019 18:14

SergeyKanzhelev reviewed Oct 18, 2019

View reviewed changes

text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved

Oberon00 reviewed Oct 19, 2019

View reviewed changes

text/0059-otlp-trace-data-format.md Show resolved Hide resolved

Oberon00 reviewed Oct 21, 2019

View reviewed changes

text/0059-otlp-trace-data-format.md Show resolved Hide resolved

Oberon00 reviewed Oct 21, 2019

View reviewed changes

text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved

text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved

text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved

text/0059-otlp-trace-data-format.md Show resolved Hide resolved

tigrannajaryan commented Oct 21, 2019

View reviewed changes

tigrannajaryan mentioned this pull request Oct 24, 2019

Refine Metrics Protobuf definitions open-telemetry/opentelemetry-proto#29

Closed

SergeyKanzhelev reviewed Oct 26, 2019

View reviewed changes

text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved

SergeyKanzhelev reviewed Oct 26, 2019

View reviewed changes

text/0059-otlp-trace-data-format.md Show resolved Hide resolved

SergeyKanzhelev reviewed Oct 26, 2019

View reviewed changes

text/0059-otlp-trace-data-format.md Outdated Show resolved Hide resolved

Tigran Najaryan added 2 commits October 29, 2019 16:47

Address Sergey's comments

f04fa74

Remove string length requirement

ad0381e

tigrannajaryan force-pushed the feature/tigran/dataschema branch from 7b17ac3 to ad0381e Compare October 29, 2019 20:47

tigrannajaryan mentioned this pull request Oct 30, 2019

Change timestamp data type to unixnano in Metrics Protobuf definitions open-telemetry/opentelemetry-proto#33

Merged

SergeyKanzhelev mentioned this pull request Oct 31, 2019

Review Data Model (Protos) open-telemetry/opentelemetry-proto#11

Closed

tigrannajaryan mentioned this pull request Oct 31, 2019

Dictionary of attribute names open-telemetry/opentelemetry-proto#12

Closed

tigrannajaryan mentioned this pull request Nov 4, 2019

Representation of multi-valued attributes open-telemetry/opentelemetry-specification#341

Closed

yurishkuro approved these changes Nov 4, 2019

View reviewed changes

Add StatusCode enum

1f0ee7b

bogdandrutu approved these changes Nov 7, 2019

View reviewed changes

bogdandrutu merged commit 75ee7ba into open-telemetry:master Nov 7, 2019

tigrannajaryan deleted the feature/tigran/dataschema branch November 12, 2019 20:32

tigrannajaryan mentioned this pull request Nov 14, 2019

Proposal: convert Tracestate to an unstructured raw string field in Span open-telemetry/opentelemetry-proto#55

Closed

Oberon00 mentioned this pull request Nov 15, 2019

Clarify what information about child count Span data should carry open-telemetry/opentelemetry-specification#355

Open

tigrannajaryan pushed a commit to tigrannajaryan/opentelemetry-proto that referenced this pull request Nov 22, 2019

Update SpanKind comments

a43d1f9

This brings the improved comments from open-telemetry/oteps#59

tigrannajaryan mentioned this pull request Nov 22, 2019

Update SpanKind comments open-telemetry/opentelemetry-proto#64

Merged

bogdandrutu pushed a commit to open-telemetry/opentelemetry-proto that referenced this pull request Nov 22, 2019

Update SpanKind comments (#64)

06be550

This brings the improved comments from open-telemetry/oteps#59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OTLP Trace Data Format specification #59

Add OTLP Trace Data Format specification #59

tigrannajaryan commented Oct 11, 2019 •

edited

Loading

z1c0 commented Oct 14, 2019 •

edited

Loading

Oberon00 commented Oct 14, 2019 •

edited

Loading

yurishkuro left a comment

tigrannajaryan left a comment

tigrannajaryan commented Oct 22, 2019

yurishkuro commented Oct 22, 2019

tigrannajaryan commented Oct 22, 2019

tigrannajaryan commented Oct 23, 2019

Oberon00 commented Oct 23, 2019

tmc commented Nov 1, 2019

tigrannajaryan commented Nov 1, 2019

tigrannajaryan commented Nov 4, 2019

yurishkuro left a comment

tigrannajaryan commented Nov 6, 2019

tigrannajaryan commented Nov 7, 2019

Oberon00 commented Nov 8, 2019 •

edited

Loading

tigrannajaryan commented Nov 8, 2019

Add OTLP Trace Data Format specification #59

Add OTLP Trace Data Format specification #59

Conversation

tigrannajaryan commented Oct 11, 2019 • edited Loading

z1c0 commented Oct 14, 2019 • edited Loading

Oberon00 commented Oct 14, 2019 • edited Loading

yurishkuro left a comment

Choose a reason for hiding this comment

tigrannajaryan left a comment

Choose a reason for hiding this comment

tigrannajaryan commented Oct 22, 2019

yurishkuro commented Oct 22, 2019

tigrannajaryan commented Oct 22, 2019

tigrannajaryan commented Oct 23, 2019

Oberon00 commented Oct 23, 2019

tmc commented Nov 1, 2019

tigrannajaryan commented Nov 1, 2019

tigrannajaryan commented Nov 4, 2019

yurishkuro left a comment

Choose a reason for hiding this comment

tigrannajaryan commented Nov 6, 2019

tigrannajaryan commented Nov 7, 2019

Oberon00 commented Nov 8, 2019 • edited Loading

tigrannajaryan commented Nov 8, 2019

tigrannajaryan commented Oct 11, 2019 •

edited

Loading

z1c0 commented Oct 14, 2019 •

edited

Loading

Oberon00 commented Oct 14, 2019 •

edited

Loading

Oberon00 commented Nov 8, 2019 •

edited

Loading