Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parsing JSON #831

Closed
dlvenable opened this issue Jan 6, 2022 · 2 comments · Fixed by #1696
Closed

Support parsing JSON #831

dlvenable opened this issue Jan 6, 2022 · 2 comments · Fixed by #1696
Assignees
Labels
enhancement New feature or request plugin - processor A plugin to manipulate data in the data prepper pipeline.
Milestone

Comments

@dlvenable
Copy link
Member

dlvenable commented Jan 6, 2022

Is your feature request related to a problem? Please describe.

Data Prepper events may have JSON values inside Event fields. Data Prepper should be able to parse these JSON strings and create fields directly in the Event from the JSON.

Describe the solution you'd like

Provide a JSON parsing processor - parse_json.

It should be able to parse a JSON string from a field and set the values in the Event object. This processor will automatically support nesting.

Example

Given the following configuration:

processor:
  parse_json:
    source: my_field

Given this input event:

"my_field" : "{\"key1\" : \"value1\", \"key2\" : \"value2\"}"

The input event is changed to:

"my_field" : "{\"key1\" : \"value1\", \"key2\" : \"value2\"}"
"key1" : "value1"
"key2" : "value2"

Example with Nesting

Given this input event:

"my_field" : "{\"key1\" : \"value1\", \"key2\" : { \"key2child\" : \"innerValue\" }}"

The input event is changed to:

"my_field" : "{\"key1\" : \"value1\", \"key2\" : \"value2\"}"
"key1" : "value1"
"key2" : {
  "key2child" : "innerValue"
}

Configurations

source - the field with JSON
target - the field to set the values in; by default this is the root object

@dlvenable dlvenable added this to the v1.3 milestone Jan 6, 2022
@dlvenable dlvenable added plugin - processor A plugin to manipulate data in the data prepper pipeline. enhancement New feature or request labels Jan 25, 2022
@cmanning09 cmanning09 modified the milestones: v1.3, v2.0 Feb 3, 2022
@dlvenable dlvenable modified the milestones: v2.0, v1.4 Feb 4, 2022
@dlvenable dlvenable removed this from the v1.4 milestone Mar 21, 2022
@daixba
Copy link
Contributor

daixba commented May 11, 2022

I want to request supports for below two special use cases in the parse_json processsor.

  • Case 1: Be able to filter based on Json path

For example, here is my original json file:

{"Records":[  
  {"key1": "value1", ....},
  {"key1": "value2", ....},
  ...
]}

The expected result after processed is to output multiple records (lines) to destinations (e.g. multiple docs to OpenSearch index). Simpliar to the tool jq, I can provide a json path like .Records to get child fields only.

  • Case 2: Support ndjson

For example, the raw file is not really a valid json file, however, each line is a valid json file.

{"key1": "value1", ....}
{"key1": "value2", ....}
...

The expected result after processed is to output each records (lines) to destinations (e.g. multiple docs to OpenSearch index).

@finnroblin
Copy link
Contributor

I can work on this, and would like to request feedback on another feature for the parse_json processor here.

It may be useful to support using a JSON pointer to select the part of the JSON string that will be parsed. A user could add a pointer option to their parse_json configuration containing a JSON pointer if they wish to process only the part of the JSON string that the pointer selects.

This setting would be optional, and if the pointer is not specified or invalid then the entire source will be processed. When source: my_field and pointer: /key2/key2child, the example Event:

"my_field" : "{\"key1\" : \"value1\", \"key2\" : { \"key2child\" : \"innerValue\" }}" 

Is processed into
"key2child": "innerValue".

If the inner key conflicts with another field on the Event, the absolute path of the inner key will be placed in the destination field (for this example, it's: key2/key2child).

Alternatively, the JSON pointer could be specified in the source (like source: message/key2/key2child). However, I think that decoupling this feature from the source option is a less confusing user experience. The JSON data is not related to what the source field is named since many sources have Event fields with message on them, and having to respecify the JSON pointer if the name of the source field changes is confusing. Elsewhere in Data Prepper the source field is used only to direct the processor to the field to process, so I suggest that parse_json follows this convention and has a separate optional configuration option pointer to parse based on a JSON pointer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request plugin - processor A plugin to manipulate data in the data prepper pipeline.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants