Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Dynamic Auto-Map #3965

Closed
Tracked by #3967
adplotzk opened this issue Jan 16, 2024 · 3 comments
Closed
Tracked by #3967

[Feature Request] Dynamic Auto-Map #3965

adplotzk opened this issue Jan 16, 2024 · 3 comments
Labels
plugin - processor A plugin to manipulate data in the data prepper pipeline.

Comments

@adplotzk
Copy link

adplotzk commented Jan 16, 2024

  {
      "src_key_0.1": "val_0.1",
      "src_key_0.2": "val_0.2",
      "src_key_0.3": "val_0.3",
      "src_key_0.4": {
	      "src_key_0.5": {
		      "src_key_0.6": "val_0.4"
	      }
      },
      "some_source": [
	      {
		      "nested_array": [
			      {
				      "src_key_1": "val_1",
				      "src_key_2": "val_2",
				      "src_key_3": "val_3"
			      },
			      {
				      "src_key_1": "val_5",
				      "src_key_2": "val_6",
				      "src_key_3": "val_7"
			      }
		      ]
	      }
      ]
  }

#TO

  {
      "somekey": [
			[
				"src_key_0.1",
				""val_0.1"
			],
			[
				"src_key_0.2",
				""val_0.2"
			],
			[
				"src_key_0.3",
				""val_0.3"
			],
			[
				"src_key_0.4.src_key0.5.src_key0.6",
				"val_0.4"
			],
			[
				"some_source[].nested_array[].src_key_1",
				"val_3, val5"
			],
			[
				"some_source[].nested_array[].src_key_2",
				"val_3, val_6"
			],
			[
				"some_source[].nested_array[].src_key_3",
				"val_3, val_7"
			],
		]
  }

#MAPPING

  - set_default_map_to: "somekey"
@dlvenable dlvenable added plugin - processor A plugin to manipulate data in the data prepper pipeline. and removed untriaged labels Jan 16, 2024
@oeyh
Copy link
Collaborator

oeyh commented Jan 19, 2024

This looks to me is to collapse/flatten nested json objects and use dot (or some other char) as delimiter in keys.

If the source keys are known, this can be done with existing copy_values processor and with each from_key and to_key specified:

  copy_values:
    entries:
      - from_key: "src_key_0.1"
        to_key: "somekey/src_key_0.1"
      - from_key: "src_key_0.4/src_key_0.5/src_key_0.6"
        to_key: "somekey/src_key_0.4.src_key_0.5.src_key_0.6"
      ...

but it can be a tedious process to set this up if there are many fields in the json.

To flatten all fields in json and put under a destination key, we can have a new processor flatten_json with config like this:

flatten_json:
  source: "source-key"
  target: "target-key"
  delimiter: "."

source: flatten all fields under source; defaults to root if not specified
target: put flattened fields under target; defaults to root if not specified
delimiter: the delimiter to use in keys for flattened fields

@oeyh
Copy link
Collaborator

oeyh commented Jan 19, 2024

Discussed offline. The description has been updated. The transformation involves these steps:

  • Flatten the json object. We will end up with bunch of fields on the same level with keys like "key1.key2.key3" or "key4[].key5" (when there are arrays). If multiple values corresponds to the same key, put them in an array
  • In the flattened json, if the value is an array, convert it to a string, eg ["value1", "value2"] --> "value1, value2"
  • Convert the flattened fields (key-value pairs) to an array, eg {"key1": "value1", "key2": "value2"} --> [["key1", "value1"], ["key2", "value2"]]. This can be done with an additional boolean option called convert_object_to_list to map_to_list processor. If that’s set to true, the processor further transforms objects to a list.
    The configuration would be:
  processor:
    - map_to_list:
        source: "my-map"
        target: "my-list"
        convert_object_to_list: true

@oeyh
Copy link
Collaborator

oeyh commented Feb 28, 2024

Resolved via #4033, #4075 and #4128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin - processor A plugin to manipulate data in the data prepper pipeline.
Projects
Development

No branches or pull requests

3 participants