Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpack and flatten array and key-value structures when feasible #235

Closed
cartermp opened this issue Jan 22, 2024 · 3 comments · Fixed by #240
Closed

Unpack and flatten array and key-value structures when feasible #235

cartermp opened this issue Jan 22, 2024 · 3 comments · Fixed by #240
Assignees
Labels
type: enhancement New feature or request

Comments

@cartermp
Copy link
Member

This is related to #90 but not the same issue.

Consider the following JSON representation of an incoming OTLP log:

 {
    "body": {
      "clientIp": "111.222.888",
      "loglevel": "INFO",
      "message": "This is a test log message for abc application",
      "service": "helloworld",
      "span_id": "9c1544cc4f7ff369",
      "testtag": "fluentbit",
      "timestamp": "2024-01-08 18:37:08.150",
      "trace_id": "7ada6c95a1bd243fa9013cab515173a9"
    },
    "instrumentation.name": "",
    "instrumentation.version": "",
    "observed_timestamp": 0,
    "severity_text": "",
    "span_id": "",
    "trace_id": ""
  }

This should be turned into an event with a dozen or so fields:

 {
    "clientIp": "111.222.888",
    "loglevel": "INFO",
    "message": "This is a test log message for abc application",
    "service": "helloworld",
    "span_id": "9c1544cc4f7ff369",
    "testtag": "fluentbit",
    "timestamp": "2024-01-08 18:37:08.150",
    "trace_id": "7ada6c95a1bd243fa9013cab515173a9"
    "instrumentation.name": "",
    "instrumentation.version": "",
    "observed_timestamp": 0,
    "severity_text": "",
  }

It is perhaps an open question of what we de-dupe, or if we need to call each of those nested fields body.span_id or body.message instead of span_id and message, but you get the idea. Seems useful, right? Haha, you fool! You fell for the trap! We do no such thing. Instead, it comes out looking more like this:

 {
    "body":"{\"clientIp\":\"111.222.888\",\"loglevel\":\"INFO\",\"message\":\"This is a test log message for abc application\",\"service\":\"helloworld\",\"span_id\":\"9c1544cc4f7ff369\",\"testtag\":\"fluentbit\",\"timestamp\":\"2024-01-0818:37:08.150\",\"trace_id\":\"7ada6c95a1bd243fa9013cab515173a9\"}",
    "instrumentation.name": "",
    "instrumentation.version": "",
    "observed_timestamp": 0,
    "severity_text": "",
    "span_id": "",
    "trace_id": ""
  }

That's because when we see the body object, we serialize its contents as a JSON string. And so, as a result, sending us structured logs with some nice fields in the body (i.e., like a normal person would do) creates an unusable mess and a sucky user experience.

This is also true for trace data -- any field that's a key-value pair or array gets serialized.

Changing this is...technically a breaking change. The best kind. But in practice, people really don't rely on these structures too much. As it turns out, if we create garbage, people don't typically query garbage. There are a dozen or so Derived Columns in all of production Honeycomb that operate on a string like this, and most appear to be using regex to pluck out useful values...which looks like a workaround for this poor behavior. That's...really not a whole lot of dependency on this behavior, and from what I can glean, it really seems like people are working around this stuff rather than enjoying having to pluck out useful data from a giant string blob. Additionally, for logs in particular, the impact is so low that I don't think a single customer would be impacted.

@cartermp cartermp added the type: enhancement New feature or request label Jan 22, 2024
@lizthegrey
Copy link
Member

With regard to the concern over CPU consumption: probably can just look for matching start and end chars of [] {}; if not found, no need to do the expensive JSON parse.

@TylerHelmuth
Copy link
Contributor

We implemented a flatten function in OTTL very recently, some of the discussion in there may be helpful: open-telemetry/opentelemetry-collector-contrib#29283

@cartermp
Copy link
Member Author

cartermp commented Jan 22, 2024

Just a note @lizthegrey and @TylerHelmuth this is actually data we already see as key-value pairs in the proto. Instead of walking that structure and turning them into fields, we literally just serialize the whole graph into a single string attribute.

#90 is when we receive attribute data that's a JSON-encoded object and we just forward it as a string attribute instead of flattening those "nested attributes".

@MikeGoldsmith MikeGoldsmith self-assigned this Feb 16, 2024
MikeGoldsmith added a commit that referenced this issue Feb 22, 2024
## Which problem is this PR solving?
When translating OTLP kv lists we currently just encode as a JSON string
which isn't particularly useful. This PR extends the extraction
behaviour to traverse kv lists to generate multiple event fields. The
max depth is set to 5 levels.

For example, this kv attribute:
```
key: "data"
value: {
  "key1": "val1",
  "key2": true,
}
```

Previously would be stored like this:
```
data: "{\"key1\": \"val1\","key2":true}"
```

After the change, the same kv list would be stored like this:
```
data.key1: "val1"
data.key2: true
```

This change exposes the `AddAttributesToMap` func for consumers to use
directly. This will be useful until metrics translation can be moved
into this library (eg from shepherd).

- Closes #235 

## Short description of the changes
- Update logic to extract OTLP data types to traverse kv lists instead
of always encoding as JSON string, with a max depth of 5
- Expose `AddAttributesToMap` as a public function so consumers can use
directly
- Update log translation to set the `body` field with a JSON string of
the attribute to preserve backwards compatibility
- Removes unused & broken truncation event fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants