[pkg/ottl] Support json flatten operations #29283

danelson · 2023-11-15T01:52:18Z

Component(s)

pkg/ottl
processor/transform

Is your feature request related to a problem? Please describe.

Original discussion in https://cloud-native.slack.com/archives/C01N6P7KR6W/p1699999733202679

Some backends do not support json slices/maps. It would be nice if there were functions to support flattening the data. Consider the following

{
  "name": "test",
  "address": {
    "street": "first",
    "house": 1234
  },
  "occupants": ["user 1", "user 2"]
}

Possible outputs

Flatten to stringified json

{
    "name": "test",
    "address": "{\"street\":\"first\", \"house\": 1234}",
    "occupants": "[\"user 1\", \"user 2\"]"
}

Flatten to modified attribute names

{
    "name": "test",
    "address.street": "first",
    "address.house": 1234,
    "occupants.0": "user 1",
    "occupants.1": "user 2"
}

Initial questions:

Can/should style 2 above support multiple styles (dot, underscore, path, etc.)?
What if the backend supports slices but not maps? Can we only apply this to 1 or the other?
Should there be a limit on depth?

Describe the solution you'd like

A function such as FlattenJSON, FlattenMap, FlattenSlice

The text was updated successfully, but these errors were encountered:

github-actions · 2023-11-15T01:52:36Z

Pinging code owners:

pkg/ottl: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley

See Adding Labels via Comments if you do not have permissions to add labels yourself.

TylerHelmuth · 2023-11-15T02:00:12Z

I definitely prefer format 2. I hadn't thought about flattening slices like that. Is the normal? I expected slices to be left alone but your suggestion makes sense.

bryan-aguilar · 2023-11-15T04:59:54Z

Is there any standard here when it comes to flattening? Follow up question, would we want to support both types of flattening?

ImDevinC · 2023-11-15T06:36:49Z

I think the second solution is probably better, that's how Cloudwatch would parse these logs. The first one only exists because I wrote a simple processor for this on my own and that was the easier solution for me at the time.

danelson · 2023-11-15T15:19:51Z

I also prefer option 2.

I hadn't thought about flattening slices like that. Is the normal?

I am not sure if it is desirable. A long array would cause an explosion of attributes which may cause other issues.

That being said, some backends only support arrays of primitive types so

Works

{
  "items": ["value1", "value2"]
}

Does not work

{
  "items":
  [
    {
      "key": "value1"
    },
    {
      "key": "value2"
    }
  ]
}

It would be nice to have some options to handle this.

bryan-aguilar · 2023-11-15T16:43:08Z

A long array would cause an explosion of attributes which may cause other issues.

I think this is a good point but I think this is an inherent risk when flattening data like this.

With option 2 wouldn't this

{
  "items":
  [
    {
      "key": "value1"
    },
    {
      "key": "value2"
    }
  ]
}

become

{
  items.0.key: value1
  items.1.key: value2
}

?

TylerHelmuth · 2023-11-15T19:12:50Z

@bryan-aguilar I feel like option 1 is more of a stringify solution that flattening. I'd rather see that accomplished via set[attributes["address"], String(attributes["address"])).

I agree with your interpretation of how the slice would be flattened.

danelson · 2023-11-15T19:52:37Z

I think this is a good point but I think this is an inherent risk when flattening data like this.

If the desire is to prevent backends from rejecting data then this might trading 1 problem for another. For instance my backend doesn't supported json maps or more than 255 attributes. If I had (psuedo json)

{
  "key": "value",
  "slice": [ <300 items in here> ]
}

then my backend would drop "slice" but allow the rest of the payload. But if we turned this into

{
  "key": "value",
  "slice.0": "<item0>",
  "slice.1": "<item1>",
  ...
  "slice.299": "<item299"
}

Now my backend would drop the entire payload.

Could we support some type of depth parameter for nested levels and some kind of limit parameter for slice length to help guard against this?

I think you can use limit but I don't know if you could be targeted in your approach

bryan-aguilar · 2023-11-15T20:44:08Z

@danelson this is the same issue with objects though right? You could have an object with 350 nested fields that would flatten to 350 separate fields.

{
  "name": "test",
  "bigobj": {
    "field1": 1,
    "field2": 2,
    ....,
    "field350":  350
  }
}

ImDevinC · 2023-11-15T23:11:07Z

@danelson wouldn't the existing truncate_all be another solution for removing those excess attributes and not dropping everything? So you could flattenJSON and then truncate_all to remove excess that would be dropped?

danelson · 2023-11-15T23:52:07Z

this is the same issue with objects though right?

Yes, you are correct. I guess I wasn't thinking about that since it hasn't been an issue for us.

wouldn't the existing truncate_all be another solution for removing those excess attributes and not dropping everything? So you could flattenJSON and then truncate_all to remove excess that would be dropped?

I think you mean limit? I would like to give precedence to attributes that do not originate in maps/slices. I don't think that is possible. Since I don't know the names of the incoming attributes I have to apply flatten and then limit which will not work. My admittedly somewhat janky approach (because it isn't deterministic in ensuring the attribute limit is not breached) would be to support a limit option in the flatten function so I choose up to n items.

danelson · 2023-11-16T00:02:55Z

I realize my use case may not be standard. Just thought it would help the discussion.

I would be happy if there was support for the following which I think solves a lot of generic use cases

Flattening maps only
Flattening maps and slices
A depth option for maps so that deeply nested structures can be ignored

bryan-aguilar · 2023-11-16T00:09:29Z

@danelson I wonder if your use case should be considered a complete separate concern. Should flatten function only worry about flattening. You should have the ability to size down your slices/maps before or after the flatten operation takes place.

Does the functionality to trim down the json object before or after flattening already?

TylerHelmuth · 2023-11-16T15:52:55Z

A depth option for maps so that deeply nested structures can be ignored

It is best to have these types of checks in the Where clause instead of the function itself. To enable this use case we could have a Depth converter that takes a map and returns its depth.

danelson · 2023-11-16T16:12:02Z

It is best to have these types of checks in the Where clause instead of the function itself.

This makes sense to me.

This may veering off track now, but is there a way to write transforms that apply this to subsets of data?

If I want to flatten a specific map I think we are saying that this proposal would allow for something like

flatten(attributes["maybe_deep_map"]) where Depth(attributes["maybe_deep_map"]) < 10

But would it be able flatten all maps that less than some depth when I don't know the name?

flatten(attributes) where Depth(attributes) < 10

TylerHelmuth · 2023-11-16T16:25:04Z

But would it be able flatten all maps that less than some depth when I don't know the name?

At the moment you would need to know the name.

ImDevinC · 2023-11-16T17:40:57Z

@TylerHelmuth just to clarify, we're saying that flatten(attributes) wouldn't work? That was the original problem I was trying to solve for, as dynamic attributes make it very difficult to know the name.

danelson · 2023-11-16T17:55:31Z

I think we are saying that flatten(attributes) would flatten everything. I was looking for more control which we think would be best provided via other checks and would be outside the scope of this issue.

evan-bradley · 2023-11-16T20:12:57Z

I think if we have a high degree of confidence that most users will want to limit map depth or list length when flattening a map of arbitrary keys, adding optional parameters would be an okay move here since it would simplify the overall UX for this functionality. If we can't do that, I think we could use #29289 to solve this case. Some combination of the below statements/conditions could be used to limit depth or remove keys that won't pass validation.

# NOTE: needs #29289 to work
for key, _ in attributes:
  flatten(attributes[key]) where IsMap(attributes[key]) and Depth(attributes[key]) < 10
  flatten(attributes[key]) where IsList(attributes[key]) and Len(attributes[key]) < 256
  delete_key(attributes, key) where IsMap(attributes[key]) and Depth(attributes[key]) >= 10
  delete_key(attributes, key) where IsList(attributes[key]) and Len(attributes[key]) >= 256

puckpuck · 2023-12-14T04:20:58Z

There should be an ability to prefix all the keys in the resulting flattened map.

{
  "name": "test",
  "address": {
    "street": "first",
    "house": 1234
  },
  "occupants": ["user 1", "user 2"]
}

results in something like this:

{
    "app.name": "test",
    "app.address.street": "first",
    "app.address.house": 1234,
    "app.occupants.0": "user 1",
    "app.occupants.1": "user 2"
}

This keeps my keys neatly packaged in a namespace so as to not be confused with other attributes.

TylerHelmuth · 2023-12-14T04:30:54Z

I can see that being an optional parameter. It gets more complicated if you want to namespace the nested maps. I think in that case the best solution is to call flatten multiple times with the different prefixes. I don't want to support namespacing the sub maps.

DaveWK · 2023-12-29T14:01:13Z

I'd just like to chime in that I am looking for this as well.

My particular use case is that I have JSON-encoded logs that are being scraped by OTEL using the journald receiver and want to collapse all the log elements that currently are shipped under Body.XXX to root-level elements, and then also re-parse the Body.MESSAGE element into sub-elements (since it's a json-formatted log) -- since there are arbitrary json elements within the MESSAGE log the current static mapping solution is not maintainable

TylerHelmuth · 2024-01-11T22:01:06Z

I've created #30455 to add this function. All interested parties please review.

**Description:** Adds a `flatten` function that allows flattening maps. I went with an editor instead of a converter, but I'm open to debate. Using an editor means that a user can do `flatten(body) where IsMap(body)` instead of `set(body, Flatten(body)) where IsMap(body). When using ParseJson you have to do: ``` - merge_maps(cache, ParseJSON(body), "upsert") - flatten(cache) ``` instead of `merge_maps(cache, Flatten(ParseJSON(body)), "upsert")`. Ultimately I went with an editor for similar reasons that `merge_maps` is an editor: chaining too many functions together is messy and updating maps is very fast with pdata. The function supports 2 optional parameters, `prefix` and `depth`. Use `prefix` to add a "namespace" to the values that are being flattened. Use `depth` to prevent trying to flatten maps that are too deep. See the function doc for examples. **Link to tracking Issue:** <Issue number if applicable> Closes #29283 **Testing:** <Describe what testing was performed and which tests were added.> Added new unit and e2e tests. Please scrutinize. **Documentation:** <Describe the documentation added.> Added function doc. --------- Co-authored-by: Evan Bradley <[email protected]>

**Description:** Adds a `flatten` function that allows flattening maps. I went with an editor instead of a converter, but I'm open to debate. Using an editor means that a user can do `flatten(body) where IsMap(body)` instead of `set(body, Flatten(body)) where IsMap(body). When using ParseJson you have to do: ``` - merge_maps(cache, ParseJSON(body), "upsert") - flatten(cache) ``` instead of `merge_maps(cache, Flatten(ParseJSON(body)), "upsert")`. Ultimately I went with an editor for similar reasons that `merge_maps` is an editor: chaining too many functions together is messy and updating maps is very fast with pdata. The function supports 2 optional parameters, `prefix` and `depth`. Use `prefix` to add a "namespace" to the values that are being flattened. Use `depth` to prevent trying to flatten maps that are too deep. See the function doc for examples. **Link to tracking Issue:** <Issue number if applicable> Closes open-telemetry#29283 **Testing:** <Describe what testing was performed and which tests were added.> Added new unit and e2e tests. Please scrutinize. **Documentation:** <Describe the documentation added.> Added function doc. --------- Co-authored-by: Evan Bradley <[email protected]>

danelson added enhancement New feature or request needs triage New item requiring triage labels Nov 15, 2023

github-actions bot added the pkg/ottl label Nov 15, 2023

TylerHelmuth added priority:p2 Medium and removed needs triage New item requiring triage labels Nov 15, 2023

github-actions bot mentioned this issue Nov 21, 2023

Weekly Report: 2023-11-14 - 2023-11-21 #29422

Closed

TylerHelmuth self-assigned this Jan 10, 2024

TylerHelmuth mentioned this issue Jan 11, 2024

[pkg/ottl] Add flatten function #30455

Merged

TylerHelmuth closed this as completed in #30455 Jan 17, 2024

TylerHelmuth mentioned this issue Jan 22, 2024

Unpack and flatten array and key-value structures when feasible honeycombio/husky#235

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pkg/ottl] Support json flatten operations #29283

[pkg/ottl] Support json flatten operations #29283

danelson commented Nov 15, 2023 •

edited

Loading

github-actions bot commented Nov 15, 2023

TylerHelmuth commented Nov 15, 2023

bryan-aguilar commented Nov 15, 2023

ImDevinC commented Nov 15, 2023

danelson commented Nov 15, 2023 •

edited

Loading

bryan-aguilar commented Nov 15, 2023 •

edited

Loading

TylerHelmuth commented Nov 15, 2023

danelson commented Nov 15, 2023 •

edited

Loading

bryan-aguilar commented Nov 15, 2023 •

edited

Loading

ImDevinC commented Nov 15, 2023 •

edited

Loading

danelson commented Nov 15, 2023 •

edited

Loading

danelson commented Nov 16, 2023 •

edited

Loading

bryan-aguilar commented Nov 16, 2023

TylerHelmuth commented Nov 16, 2023 •

edited

Loading

danelson commented Nov 16, 2023

TylerHelmuth commented Nov 16, 2023

ImDevinC commented Nov 16, 2023

danelson commented Nov 16, 2023

evan-bradley commented Nov 16, 2023

puckpuck commented Dec 14, 2023

TylerHelmuth commented Dec 14, 2023 •

edited

Loading

DaveWK commented Dec 29, 2023 •

edited

Loading

TylerHelmuth commented Jan 11, 2024

[pkg/ottl] Support json flatten operations #29283

[pkg/ottl] Support json flatten operations #29283

Comments

danelson commented Nov 15, 2023 • edited Loading

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

github-actions bot commented Nov 15, 2023

TylerHelmuth commented Nov 15, 2023

bryan-aguilar commented Nov 15, 2023

ImDevinC commented Nov 15, 2023

danelson commented Nov 15, 2023 • edited Loading

bryan-aguilar commented Nov 15, 2023 • edited Loading

TylerHelmuth commented Nov 15, 2023

danelson commented Nov 15, 2023 • edited Loading

bryan-aguilar commented Nov 15, 2023 • edited Loading

ImDevinC commented Nov 15, 2023 • edited Loading

danelson commented Nov 15, 2023 • edited Loading

danelson commented Nov 16, 2023 • edited Loading

bryan-aguilar commented Nov 16, 2023

TylerHelmuth commented Nov 16, 2023 • edited Loading

danelson commented Nov 16, 2023

TylerHelmuth commented Nov 16, 2023

ImDevinC commented Nov 16, 2023

danelson commented Nov 16, 2023

evan-bradley commented Nov 16, 2023

puckpuck commented Dec 14, 2023

TylerHelmuth commented Dec 14, 2023 • edited Loading

DaveWK commented Dec 29, 2023 • edited Loading

TylerHelmuth commented Jan 11, 2024

danelson commented Nov 15, 2023 •

edited

Loading

danelson commented Nov 15, 2023 •

edited

Loading

bryan-aguilar commented Nov 15, 2023 •

edited

Loading

danelson commented Nov 15, 2023 •

edited

Loading

bryan-aguilar commented Nov 15, 2023 •

edited

Loading

ImDevinC commented Nov 15, 2023 •

edited

Loading

danelson commented Nov 15, 2023 •

edited

Loading

danelson commented Nov 16, 2023 •

edited

Loading

TylerHelmuth commented Nov 16, 2023 •

edited

Loading

TylerHelmuth commented Dec 14, 2023 •

edited

Loading

DaveWK commented Dec 29, 2023 •

edited

Loading