Unpack and flatten array and key-value structures when feasible #235

cartermp · 2024-01-22T16:24:12Z

This is related to #90 but not the same issue.

Consider the following JSON representation of an incoming OTLP log:

 {
    "body": {
      "clientIp": "111.222.888",
      "loglevel": "INFO",
      "message": "This is a test log message for abc application",
      "service": "helloworld",
      "span_id": "9c1544cc4f7ff369",
      "testtag": "fluentbit",
      "timestamp": "2024-01-08 18:37:08.150",
      "trace_id": "7ada6c95a1bd243fa9013cab515173a9"
    },
    "instrumentation.name": "",
    "instrumentation.version": "",
    "observed_timestamp": 0,
    "severity_text": "",
    "span_id": "",
    "trace_id": ""
  }

This should be turned into an event with a dozen or so fields:

 {
    "clientIp": "111.222.888",
    "loglevel": "INFO",
    "message": "This is a test log message for abc application",
    "service": "helloworld",
    "span_id": "9c1544cc4f7ff369",
    "testtag": "fluentbit",
    "timestamp": "2024-01-08 18:37:08.150",
    "trace_id": "7ada6c95a1bd243fa9013cab515173a9"
    "instrumentation.name": "",
    "instrumentation.version": "",
    "observed_timestamp": 0,
    "severity_text": "",
  }

It is perhaps an open question of what we de-dupe, or if we need to call each of those nested fields body.span_id or body.message instead of span_id and message, but you get the idea. Seems useful, right? Haha, you fool! You fell for the trap! We do no such thing. Instead, it comes out looking more like this:

 {
    "body":"{\"clientIp\":\"111.222.888\",\"loglevel\":\"INFO\",\"message\":\"This is a test log message for abc application\",\"service\":\"helloworld\",\"span_id\":\"9c1544cc4f7ff369\",\"testtag\":\"fluentbit\",\"timestamp\":\"2024-01-0818:37:08.150\",\"trace_id\":\"7ada6c95a1bd243fa9013cab515173a9\"}",
    "instrumentation.name": "",
    "instrumentation.version": "",
    "observed_timestamp": 0,
    "severity_text": "",
    "span_id": "",
    "trace_id": ""
  }

That's because when we see the body object, we serialize its contents as a JSON string. And so, as a result, sending us structured logs with some nice fields in the body (i.e., like a normal person would do) creates an unusable mess and a sucky user experience.

This is also true for trace data -- any field that's a key-value pair or array gets serialized.

Changing this is...technically a breaking change. The best kind. But in practice, people really don't rely on these structures too much. As it turns out, if we create garbage, people don't typically query garbage. There are a dozen or so Derived Columns in all of production Honeycomb that operate on a string like this, and most appear to be using regex to pluck out useful values...which looks like a workaround for this poor behavior. That's...really not a whole lot of dependency on this behavior, and from what I can glean, it really seems like people are working around this stuff rather than enjoying having to pluck out useful data from a giant string blob. Additionally, for logs in particular, the impact is so low that I don't think a single customer would be impacted.

The text was updated successfully, but these errors were encountered:

lizthegrey · 2024-01-22T16:30:17Z

With regard to the concern over CPU consumption: probably can just look for matching start and end chars of [] {}; if not found, no need to do the expensive JSON parse.

TylerHelmuth · 2024-01-22T16:43:12Z

We implemented a flatten function in OTTL very recently, some of the discussion in there may be helpful: open-telemetry/opentelemetry-collector-contrib#29283

cartermp · 2024-01-22T18:07:00Z

Just a note @lizthegrey and @TylerHelmuth this is actually data we already see as key-value pairs in the proto. Instead of walking that structure and turning them into fields, we literally just serialize the whole graph into a single string attribute.

#90 is when we receive attribute data that's a JSON-encoded object and we just forward it as a string attribute instead of flattening those "nested attributes".

## Which problem is this PR solving? When translating OTLP kv lists we currently just encode as a JSON string which isn't particularly useful. This PR extends the extraction behaviour to traverse kv lists to generate multiple event fields. The max depth is set to 5 levels. For example, this kv attribute: ``` key: "data" value: { "key1": "val1", "key2": true, } ``` Previously would be stored like this: ``` data: "{\"key1\": \"val1\","key2":true}" ``` After the change, the same kv list would be stored like this: ``` data.key1: "val1" data.key2: true ``` This change exposes the `AddAttributesToMap` func for consumers to use directly. This will be useful until metrics translation can be moved into this library (eg from shepherd). - Closes #235 ## Short description of the changes - Update logic to extract OTLP data types to traverse kv lists instead of always encoding as JSON string, with a max depth of 5 - Expose `AddAttributesToMap` as a public function so consumers can use directly - Update log translation to set the `body` field with a JSON string of the attribute to preserve backwards compatibility - Removes unused & broken truncation event fields

cartermp added the type: enhancement New feature or request label Jan 22, 2024

cartermp mentioned this issue Feb 9, 2024

Consider unpacking OTLP arrays #239

Open

MikeGoldsmith self-assigned this Feb 16, 2024

MikeGoldsmith mentioned this issue Feb 16, 2024

feat: Allow flattening of OTLP key-value lists #240

Merged

MikeGoldsmith closed this as completed in #240 Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unpack and flatten array and key-value structures when feasible #235

Unpack and flatten array and key-value structures when feasible #235

cartermp commented Jan 22, 2024

lizthegrey commented Jan 22, 2024

TylerHelmuth commented Jan 22, 2024

cartermp commented Jan 22, 2024 •

edited

Loading

Unpack and flatten array and key-value structures when feasible #235

Unpack and flatten array and key-value structures when feasible #235

Comments

cartermp commented Jan 22, 2024

lizthegrey commented Jan 22, 2024

TylerHelmuth commented Jan 22, 2024

cartermp commented Jan 22, 2024 • edited Loading

cartermp commented Jan 22, 2024 •

edited

Loading