-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unpack and flatten array and key-value structures when feasible #235
Comments
With regard to the concern over CPU consumption: probably can just look for matching start and end chars of |
We implemented a |
Just a note @lizthegrey and @TylerHelmuth this is actually data we already see as key-value pairs in the proto. Instead of walking that structure and turning them into fields, we literally just serialize the whole graph into a single string attribute. #90 is when we receive attribute data that's a JSON-encoded object and we just forward it as a string attribute instead of flattening those "nested attributes". |
## Which problem is this PR solving? When translating OTLP kv lists we currently just encode as a JSON string which isn't particularly useful. This PR extends the extraction behaviour to traverse kv lists to generate multiple event fields. The max depth is set to 5 levels. For example, this kv attribute: ``` key: "data" value: { "key1": "val1", "key2": true, } ``` Previously would be stored like this: ``` data: "{\"key1\": \"val1\","key2":true}" ``` After the change, the same kv list would be stored like this: ``` data.key1: "val1" data.key2: true ``` This change exposes the `AddAttributesToMap` func for consumers to use directly. This will be useful until metrics translation can be moved into this library (eg from shepherd). - Closes #235 ## Short description of the changes - Update logic to extract OTLP data types to traverse kv lists instead of always encoding as JSON string, with a max depth of 5 - Expose `AddAttributesToMap` as a public function so consumers can use directly - Update log translation to set the `body` field with a JSON string of the attribute to preserve backwards compatibility - Removes unused & broken truncation event fields
This is related to #90 but not the same issue.
Consider the following JSON representation of an incoming OTLP log:
This should be turned into an event with a dozen or so fields:
It is perhaps an open question of what we de-dupe, or if we need to call each of those nested fields
body.span_id
orbody.message
instead ofspan_id
andmessage
, but you get the idea. Seems useful, right? Haha, you fool! You fell for the trap! We do no such thing. Instead, it comes out looking more like this:That's because when we see the
body
object, we serialize its contents as a JSON string. And so, as a result, sending us structured logs with some nice fields in the body (i.e., like a normal person would do) creates an unusable mess and a sucky user experience.This is also true for trace data -- any field that's a key-value pair or array gets serialized.
Changing this is...technically a breaking change. The best kind. But in practice, people really don't rely on these structures too much. As it turns out, if we create garbage, people don't typically query garbage. There are a dozen or so Derived Columns in all of production Honeycomb that operate on a string like this, and most appear to be using regex to pluck out useful values...which looks like a workaround for this poor behavior. That's...really not a whole lot of dependency on this behavior, and from what I can glean, it really seems like people are working around this stuff rather than enjoying having to pluck out useful data from a giant string blob. Additionally, for logs in particular, the impact is so low that I don't think a single customer would be impacted.
The text was updated successfully, but these errors were encountered: