Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Trace Payload Collection #219

Closed
wants to merge 6 commits into from

Conversation

ronyis
Copy link

@ronyis ronyis commented Sep 14, 2022

In this OTEP, we propose a way for extending trace functionality to support payload collection.
We plan that this change will also enable collecting payload by instrumentations in the future.

This is an important capability in our vision, which could help many OpenTelemetry users to troubleshoot and debug applications much more effectively. We at Epsagon are also committed to pushing this through specifications and development.

We are glad to receive any feedback and to collaborate on this topic!

@ronyis ronyis requested a review from a team September 14, 2022 14:07
@ronyis ronyis requested a review from a team September 14, 2022 14:29
(which is a general representation of a JSON object), with some embedded metadata -

```protobuf
message Value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few important differences - supporting NullValue and the 'nested' metadata fields (and not supporting bytes)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is NullValue different from the "empty" value that AnyValue supports? What 'nested' metadata fields are you referring to? Do you mean original_encoding and others?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • That's right about "empty" value (missed it)
  • You are correct about the metadata fields


```python
# Added method to `Span` class
def add_payload_attribute(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative would be to have a special SDK-provided wrapper type passed to regular set_attribute, e.g.

p := otel.Payload(...)
span.set_attribute(key, p)

In this form you have better extensibility because new fields could be added to Payload type in the future, which you cannot do with function arguments without overloading.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a great idea to explore

(which is a general representation of a JSON object), with some embedded metadata -

```protobuf
message Value {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is NullValue different from the "empty" value that AnyValue supports? What 'nested' metadata fields are you referring to? Do you mean original_encoding and others?

For example, this will be required by a simple processor that filters specific keys in a map.
Also, it requires backends (and potentially processors) to unnecessarily attempt deserialization of every string attribute.

### Supporting nested map values in Span attributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this alternate is not sufficiently explored. IMO, instead of adding a new set of attributes with a new type that mostly overlaps with AnyValue, this alternate can be more viable.

I would rather advocate that open-telemetry/opentelemetry-specification#376 is accepted and come up with semantic convention to record the payload and associated metadata as a regular Span attribute.

This OTEP proposes and approach that requires much bigger changes, but I do not feel that the arguments in favour of it are strong enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there is room to explore this further.

Some ideas (which I can also add to the OTEP):

  • We can consider having a 'core' type (like AnyValue) which could be extended here
  • Even if Support map values and nested values for attributes opentelemetry-specification#376 is approved, I think that this proposal introduces essential advantages -
    • Adding the metadata as semantic conventions instead is quite cumbersome. We need 4 extra tags to include proposed metadata (original_encoding, encoded_size, original_length and dropped_keys). The last two should support nested values so that some keys will get duplicated. We may have more in the future.
    • Specifying attributes as payloads is important. For example, to disable all payload collection in Span SDK implementation, or in processors and exporters. Doing that by semantic conventions is less robust. And in general, naming tags like 'http.payload.request.body' isn't ideal.
  • On the other hand, there are some drawbacks to supporting nested maps in general attributes (as discussed in the other issue). So this proposal could be another alternative to that.

To sum up, I think that uniting both attribute types is possible, but we have to make sure they will be generic enough.
Doing that could over-complicate the most common use of 'simple' attributes and require even bigger changes than adding payload attributes.

Comment on lines +91 to +94
// Optional - the original bytes encoding type of this value (e.g. json/yaml/avro/csv)
string original_encoding = 2;
// Optional - the size of the value as bytes encoded (including dropped data)
int64 encoded_size = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of these fields given that the payload is deserialized and is recorded in a structured form in the Value message?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original size could be useful to troubleshoot and to generate aggregations (the deserialized might not include all the data, and it requires encoding to calculate the original size).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for example, sometimes we will not collect the entire payload (and use the dropped_keys field or original_length) - in these cases it's useful to know the original size and still be able to generate aggregations over it

int64 original_length = 7;

// Set only for MapValue, in case some of the keys were dropped
repeated string dropped_keys = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclear why this field is here instead of in the MapValue message.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.
The original reason for putting it here is because dropped_keys should be set together with original_length for shortened maps. (original_length is here because it's also used for string type).

@osherv
Copy link
Member

osherv commented Sep 14, 2022

Related Issues:
@pavolloffay
#1062
#1284
#1317

@MovieStoreGuy
open-telemetry/opentelemetry-specification#176

Copy link

@galbash galbash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is a great capability to add to OTel. From our experience in Epsagon (and looking at other vendors in the space) this kind of data gave huge value to our customers. I do agree with @tigrannajaryan that the next step should be probably discussing and choosing the best approach to deliver this value. I also agree with @tigrannajaryan that the viable options from where I stand are either what described in this OTEP or the smaller change described in open-telemetry/opentelemetry-specification#376, and not the other alternatives discussed here.

Comment on lines +7 to +8
This OTEP proposes to add support for collecting payload data in spans, by adding a non-breaking
functionality to trace API, and to OTLP. As we show in this proposal, adding such data using
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is crucial and good that you called it out. Opt-in is definitely the way to go here, and non-breaking is important for that

Comment on lines +91 to +94
// Optional - the original bytes encoding type of this value (e.g. json/yaml/avro/csv)
string original_encoding = 2;
// Optional - the size of the value as bytes encoded (including dropped data)
int64 encoded_size = 3;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for example, sometimes we will not collect the entire payload (and use the dropped_keys field or original_length) - in these cases it's useful to know the original size and still be able to generate aggregations over it

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Oct 17, 2022
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.

## Alternatives

There are other possible ways to encode payload data, using current Span attributes.
Copy link

@MSNev MSNev Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional alternative would be to define this as a "payload" events and use the Log event as the transport. A Log event can have the current span associated with the log record. We are also expanding the definition of an event to include the generic event.data which will include the "payload" of the event as defined by the schema (semantic conventions) of the event.domain / event.name combination.

The introduction of the event.data is currently included in this PR #2926

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it could be a good alternative.
A possible advantage is collecting payloads asynchronously from the span, which is more efficient in many cases (for example, a span representing an HTTP request could be closed before the response is read).
On the other hand, this approach will make it harder to do searches based on span tags + payload tags in the backend.

@ahayworth
Copy link

I am concerned that this OTEP significantly expands the scope of the OpenTelemetry project, and would introduce significant complexity in all areas of the ecosystem. And, as mentioned, there are likely ways that already existing portions could be pressed into service for this kind of functionality (even if it isn't a perfect match today).

For me specifically, I would be concerned that this would introduce an additional burden onto SDK maintainers - we already must be extraordinarily careful about the performance implications of the SDKs we produce, but should we accept this OTEP we now must retain that same diligence and care in the face of possibly very large payload values. That is not straightforward and simple, and would likely require redesigns of portions of the SDKs to be even more efficient than they already are. Efficiency is good, of course, but I don't know if this in particular should be the forcing function for such rewrites. 😄

Perhaps it is better to first have broader conversations about what OpenTelemetry is and where the boundaries of its scope lie?

@reyang
Copy link
Member

reyang commented Nov 10, 2022

I am concerned that this OTEP significantly expands the scope of the OpenTelemetry project, and would introduce significant complexity in all areas of the ecosystem.

+1 👍

@reyang
Copy link
Member

reyang commented Nov 10, 2022

I would suggest:

  1. clarify what type of the payload is intended to be in-scope, and what's not (e.g. would we anticipate the eventual drifting to some extreme situation such as "here is 1 gigabyte of video payload"?).
  2. explore how existing backends (e.g. Jaeger) would support it, and see if there is high interest from the community.

@ronyis
Copy link
Author

ronyis commented Nov 10, 2022

  1. clarify what type of the payload is intended to be in-scope, and what's not (e.g. would we anticipate the eventual drifting to some extreme situation such as "here is 1 gigabyte of video payload"?).

I think that the scope should be "reasonable" sizes, that are performant enough to collect. Probably a few KBs per span. Definitely not "1 gigabyte of video payload" IMO :).
And in the end, users could increase preconfigured limits as much as they like.

  1. explore how existing backends (e.g. Jaeger) would support it, and see if there is high interest from the community.
    I agree, but not so sure what's the right way to do that.

We can also provide a processor on the OTel Collector that will convert payload attributes into regular attributes as a workaround.
Also, based on my experience, storing JSON payloads in Elasticsearch/OpenSearch is quite straightforward.

@yurishkuro
Copy link
Member

I just took a second pass at the OTEP, and I see that @tigrannajaryan 's and my comments are largely not addressed. The proposal significantly expands the interface surface of OTEL, both in the API and in OTLP, and the justifications are not convincing, the authors admit that the same things could be achieved with the existing mechanisms and data structures. So why not explore that, maybe propose some semantic conventions.

Even more importantly, capturing payloads is a big data privacy red flag. The OTEP does not discuss this at all. From the implementation perspective, where to stash the data is a much easier problem to solve than how to capture the data in a privacy-respecting way. And while it is possible to decouple these two problems, the actual value of this OTEP would only materialize if it provides a framework for instrumentation to capture payloads, not just an extension of API/SDK on where to record it. And a framework for instrumentation would absolutely have to deal with privacy questions.

@ronyis
Copy link
Author

ronyis commented Nov 13, 2022

Thanks @yurishkuro.
I'm working on compiling the many comments and ideas into the OTEP (also I was offline for a while), representing different alternatives with pros and cons.

About the privacy issue -
I focused on APIs in this OTEP, because I think it's the foremost problem that blocks users from collecting payloads with their own instrumentations (or for companies to develop such 3rd party libs). Generally, I think that a configuration based on allow-lists and/or deny-lists for collecting payload attributes should be sufficient. I'm not aware of other ways to deal with this problem and looking forward to hearing about it. I will reference that in the OTEP as well.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Nov 28, 2022
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Nov 28, 2022
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Nov 30, 2022
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Nov 30, 2022
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Nov 30, 2022
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Nov 30, 2022
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Nov 30, 2022
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Dec 14, 2022
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
@tedsuo tedsuo added the triaged label Jan 30, 2023
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Feb 9, 2023
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Feb 9, 2023
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Feb 14, 2023
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
@nirga
Copy link

nirga commented Jun 4, 2023

Hey @ronyis are you still working on this? If not, wanted to take and complete this as we need it as well :)

@ronyis
Copy link
Author

ronyis commented Jun 6, 2023

Hey @ronyis are you still working on this? If not, wanted to take and complete this as we need it as well :)

Not actively working on it, feel free to jump in

@nirga nirga mentioned this pull request Jun 19, 2023
@ronyis
Copy link
Author

ronyis commented Jun 21, 2023

Closing this as the work is continued in open-telemetry/opentelemetry-specification#234

@ronyis ronyis closed this Jun 21, 2023
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Jan 10, 2024
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this pull request Jan 10, 2024
Resolves open-telemetry#376

Use cases where this is necessary or useful:

1. Specify more than one resource in the telemetry: open-telemetry#579
2. Data coming from external source, e.g. AWS Metadata: open-telemetry#596 (comment)
   or open-telemetry#376 (comment)
3. Capturing HTTP headers: open-telemetry#376 (comment)
4. Structured stack traces: open-telemetry#2841
5. Payloads as attributes: open-telemetry/oteps#219 (comment)

This is a draft PR to see what the change looks like.

If this PR is merged it will be nice to follow it up with:
- A standard way of flattening maps and nested objects when converting from OTLP
  to formats that don't support maps/nested objects.
- Recommendations for semantic conventions to use/not use complex objects.
@johncrim
Copy link

johncrim commented Jul 2, 2024

@ronyis - I don't see this work at all in open-telemetry/opentelemetry-specification#234 - I think you mean open-telemetry/opentelemetry-specification#234 .

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.