-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes / Byte (uint8_t) arrays support #780
Comments
I also think that byte array support could be useful. One can use Base64 or hex encoded strings instead, but it does come with a performance cost. |
I would like to have conditional provisions for this type implemented in OpenTelemetry C++ SDK. Disabled by default. But still there. Have ability to experiment with it and enable it later on. Practical reason: if we allow bytes support on API surface, then we can later on determine whether concrete exporter supports bytes or not. In those scenarios where exporter cannot handle byte arrays, the exporter itself may convert the byte array to base64-encoded string.. But if exporter supports byte arrays, e.g.
Then it can forward the binary blob using the most compact representation of it. This may become relevant for scenarios where app or service needs to emit a small memory dump, IoT memory dump, or maybe its own representation of encrypted or serialized data structure. Example code how the value attribute could be defined in C++, where the need for binary data type is common: using AttributeValue =
nostd::variant<bool,
int32_t,
int64_t,
uint32_t,
uint64_t,
double,
nostd::string_view,
#ifdef HAVE_SPAN_BYTE
// TODO: 8-bit byte arrays / binary blobs are not part of OT spec yet!
// Ref: https://github.com/open-telemetry/opentelemetry-specification/issues/780
nostd::span<const uint8_t>,
#endif
nostd::span<const bool>,
nostd::span<const int32_t>,
nostd::span<const int64_t>,
nostd::span<const uint32_t>,
nostd::span<const uint64_t>,
nostd::span<const double>,
nostd::span<const nostd::string_view>>;
It is ironic that we have |
@Oberon00 @bogdandrutu @jsuereth @pyohannes One concern here - since we got this tagged What I am proposing right now:
What do you guys think? |
I agree this can be useful for attributes and also for the Log Record Body. A prior art exists too: binary is a supported data type in Jaeger: https://github.com/jaegertracing/jaeger-idl/blob/34396033ff11c60fced342ab2858ace278fedaa8/proto/api_v2/model.proto#L57 for Span, Log and Process. This has come up a few times in the past and recently here. |
Byte arrays (binary data) is also useful for log files where we don't want to interpret / don't know the encoding. See this feature request: open-telemetry/opentelemetry-collector-contrib#3267 |
Contributes to: open-telemetry/opentelemetry-specification#780 open-telemetry/opentelemetry-collector-contrib#3267 Issue open-telemetry/opentelemetry-specification#780 nicely describes why the data type is needed. There are several use cases for binary data, both for trace and log attributes and for log record Body. This is a backward compatible addition. After this change is merged no senders will initially exist that emit binary data. Nevertheless, if such data is received by the Collector it will correctly pass such data intact through the pipeline when receiving/sending OTLP (no Collector code changes are needed for this). We do not yet have binary data type in the OpenTelemetry API, so no existing sources can emit it yet. The receivers that do not understand the binary data type should also continue functioning normally. Collector's current implementation treats any unknown data type as NULL (and this would apply to binary data type until we teach the Collector to understand binary specifically). I checked the Collector source code and this should not result in crashes or overly surprising behavior (NULL is either ignored or treated as an "unknown" branch in the code which does not care about it). We will add full support for binary data to the Collector, particularly to support translating it correctly to other formats (e.g. Jaeger, which supports binary type natively). Note: the addition of this data type to the protocol is not an obligation to expose the data type in the Attributes API. OTLP has always been a superset of what is possible to express via the API. The addition of the data type in the Attributes API should be discussed separately in the specification repo.
Contributes to: open-telemetry/opentelemetry-specification#780 open-telemetry/opentelemetry-collector-contrib#3267 Issue open-telemetry/opentelemetry-specification#780 nicely describes why the data type is needed. There are several use cases for binary data, both for trace and log attributes and for log record Body. This is a backward compatible addition. After this change is merged no senders will initially exist that emit binary data. Nevertheless, if such data is received by the Collector it will correctly pass such data intact through the pipeline when receiving/sending OTLP (no Collector code changes are needed for this). We do not yet have binary data type in the OpenTelemetry API, so no existing sources can emit it yet. The receivers that do not understand the binary data type should also continue functioning normally. Collector's current implementation treats any unknown data type as NULL (and this would apply to binary data type until we teach the Collector to understand binary specifically). I checked the Collector source code and this should not result in crashes or overly surprising behavior (NULL is either ignored or treated as an "unknown" branch in the code which does not care about it). We will add full support for binary data to the Collector, particularly to support translating it correctly to other formats (e.g. Jaeger, which supports binary type natively). Note: the addition of this data type to the protocol is not an obligation to expose the data type in the Attributes API. OTLP has always been a superset of what is possible to express via the API. The addition of the data type in the Attributes API should be discussed separately in the specification repo.
@tigrannajaryan - thank you for your work! We already have a rough-in implementation in C++ SDK. This feature also potentially enables us to reuse some of the existing APIs, or at least existing typing system for a feature like small (reasonably) -sized blob upload, expressing these as byte buffers. |
OTLP now supports bytes as a data type for Any value. Byte arrays are an important case for log data. Contributes to open-telemetry#780 TODO: consider adding bytes to Trace API too.
Contributes to open-telemetry#780 OTLP already supports bytes as a data type for Any value, see: https://github.com/open-telemetry/opentelemetry-proto/blob/de4fc37940d39370194fb774e634ca408dacd865/opentelemetry/proto/common/v1/common.proto#L37 Byte arrays are an important case for unparsed, unstrusctured log data, so we are formally adding them as a supported data type to the Log Data Model. TODO: consider adding bytes to Trace API too.
Contributes to #780 OTLP already supports bytes as a data type for Any value, see: https://github.com/open-telemetry/opentelemetry-proto/blob/de4fc37940d39370194fb774e634ca408dacd865/opentelemetry/proto/common/v1/common.proto#L37 Byte arrays are an important case for unparsed, unstrusctured log data, so we are formally adding them as a supported data type to the Log Data Model. TODO: consider adding bytes to Trace API too.
@tigrannajaryan - My understanding is we should be able to close this issue, since the protocol work for that has been complete? |
@tigrannajaryan can we add the bytes type to the list of primitive types supported in the spec here? |
We have 2 possibilities:
|
Would be also useful to determine how to convert this to "string", should we use base64? |
Per https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/README.md#standard-attribute, we are not planning on extending the standard attribute types. Our suggestion would be to use the forthcoming Events API/Data Model as a way to emit this information. |
Current spec / protocol allows the following types:
and
array
of above primitive types.Proposal is to add support for '8-bit byte array' event properties for Traces and Logs:
Practical reasons
existing / legacy telemetry flows could already allow support for byte arrays.
8-bit byte array
orbin
type is available in MsgPack protocol.same type known in protobuf as
bytes
Most IoT protocols usually operate on 8-bit byte arrays or octet streams, allowing fields be either
uint8_t
or array / sequence ofuint8_t
, or the entire payload be 8-bit / octet stream.UUID or GUID type is best represented on SDK API surface and on wire as 16-bytes rather than 36+-character string, resulting in at least x2 times better performance for native (C/C++) code, as well as more compact net-bytes on wire. Although it may be possible to represent it as array of
int32_t
- it'd be a bit unnatural to do it.although it may be possible to convert
byte array
to base64-encodedstring
representation, this results in an extra encoding/decoding in SDK, elevated memory and CPU pressure, as well as type information (such as it's a binary blob, not a string) - is inherently lost.Is it possible to add an optional provision to the protocol, that Platform SDKs that may require bin or octet stream or 8-bit byte array - may implement it on API surface?
Use-cases
Examples of exporters that would benefit from having
byte array
in the protocol spec:Not having this type natively supported by OpenTelemetry would impede the adoption of OT API / SDK by customers currently relying on octet / binary streams in their data flows. A reference implementation can be provided based on Open Telemetry C++ SDK to illustrate the use-cases, as well as perf implications of not having native support for byte arrays in API/SDK/proto/specification.
The text was updated successfully, but these errors were encountered: