Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]Add PPL JSON extended functions support #667

Closed
YANG-DB opened this issue Sep 16, 2024 · 7 comments · Fixed by #780
Closed

[FEATURE]Add PPL JSON extended functions support #667

YANG-DB opened this issue Sep 16, 2024 · 7 comments · Fixed by #780
Assignees
Labels
0.6 enhancement New feature or request Lang:PPL Pipe Processing Language support

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Sep 16, 2024

Is your feature request related to a problem?
We need to enhance OpenSearch PPL (Piped Processing Language) by adding support for JSON-based operations. This extension will provide users with powerful tools to manipulate and analyze JSON data within PPL queries.

Proposed New Functions

  1. json_object(): Creates a JSON object from existing key/value pairs.
   ... | eval person = json_object("name", "John", "age", 30)
  1. json(): Evaluates whether a value can be parsed as JSON. Returns the value if valid, null otherwise.
   ... | eval result = json('{"name":"John","age":30}')
  1. json_append(): Appends elements to the contents of a valid JSON object.
   ... | eval person = json_object("name", "John"), extended_person = json_append(person, "city", "New York")
  1. json_array(): Creates a JSON array using a list of values.
   ... | eval json_list = json_array("apple", "banana", "cherry")
  1. json_array_to_mv(): Maps the elements of a JSON array to a multivalued field.
   ... | eval json_list = json_array("apple", "banana", "cherry"), mv_field = json_array_to_mv(json_list)
  1. json_delete(): Removes one or more keys and their corresponding values from the specified JSON object.
   ... | eval person = json_object("name", "John", "age", 30, "city", "New York"), person_no_age = json_delete(person, "age")
  1. json_extend(): Extends the contents of a valid JSON object with the values of an array.
   ... | eval person = json_object("name", "John", "age", 30), extended_person = json_extend(person, json_array("city", "New York"))
  1. json_extract(): Returns either a JSON array or a native type value from a field and zero or more paths.
   ... | eval data = json('{"person": {"name": "John", "age": 30}}'), name = json_extract(data, "person.name")
  1. json_keys(): Returns the keys from the key-value pairs in a JSON object as a JSON array.
   ... | eval person = json_object("name", "John", "age", 30), keys = json_keys(person)
  1. json_set(): Inserts or overwrites values for a JSON node with provided values and returns an updated JSON object.
   ... | eval person = json_object("name", "John"), updated_person = json_set(person, "age", 30)
  1. json_valid(): Evaluates whether a JSON object uses valid JSON syntax and returns TRUE or FALSE.
   ... | eval is_valid = json_valid('{"name":"John","age":30}')
  1. all(): Iterates over JSON array values and returns true if every value matches the provided predicate.
   ... | eval json_list = json_array(1, 2, 3), all_match = all(json_list, val -> val != 0)
  1. any(): Iterates over JSON array values and returns true if any value matches the provided predicate.
   ... | eval json_list = json_array(1, 2, 3), any_match = any(json_list, val -> val == 0)
  1. filter(): Iterates over JSON array values and performs an operation on each value.
   ... | eval json_list = json_array(1, 2, 3, 4), filtered_list = filter(json_list, val -> val % 2 == 0)
  1. map(): Iterates over JSON array values and performs an operation on each value.
   ... | eval json_list = json_array(1, 2, 3), mapped_list = map(json_list, val -> val * 2)
  1. reduce(): Iterates over a JSON array in a field or a literal array and performs an accumulation operation.
   ... | eval json_list = json_array(1, 2, 3), sum = reduce(json_list, 0, (acc, val) -> acc + val)

Technical Considerations

  • Consider leveraging existing JSON libraries for efficient parsing and manipulation.
  • Ensure that these functions can handle large JSON objects without significant performance degradation.
@A-Gray-Cat
Copy link

@YANG-DB These features look awesome. Can we add support to access elements within an array using index too?

@YANG-DB
Copy link
Member Author

YANG-DB commented Sep 17, 2024

@A-Gray-Cat yes that is possible - can you please suggest the syntax and we can discuss it further

@A-Gray-Cat
Copy link

Thanks! E.g.

{
    "items": ["a", "b", "c"]
}

Users can access the first element of the array by using syntax like eval first_item = items[0], or it can be retrieved directly using the fields command, e.g. fields items[0].

@A-Gray-Cat
Copy link

A-Gray-Cat commented Sep 18, 2024

Hello @YANG-DB , can we also add support to filter a json object using either the key or the value, or both? E.g.

... | eval person = json_object("name", "John", "age", 30), all_match = filter(person, k,v -> k = "name")
... | eval person = json_object("name", "John", "age", 30), all_match = filter(person, k,v -> v = "John")
... | eval person = json_object("name", "John", "age", 30), all_match = filter(person, k,v -> k = "name" and v = "John")

@YANG-DB YANG-DB added the 0.6 label Oct 9, 2024
@LantaoJin LantaoJin moved this from Todo to Design in PPL Commands Oct 14, 2024
@YANG-DB YANG-DB moved this from Design to In Progress in PPL Commands Oct 17, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in PPL Commands Oct 25, 2024
@LantaoJin
Copy link
Member

Thanks! E.g.

{
    "items": ["a", "b", "c"]
}

Users can access the first element of the array by using syntax like eval first_item = items[0], or it can be retrieved directly using the fields command, e.g. fields items[0].

With #780 , you can access the first element by

| eval first_elem = json_extract('{"items": ["a", "b", "c"]}', '$.items[0]')

@LantaoJin
Copy link
Member

LantaoJin commented Nov 5, 2024

Hello @YANG-DB , can we also add support to filter a json object using either the key or the value, or both? E.g.

... | eval person = json_object("name", "John", "age", 30), all_match = filter(person, k,v -> k = "name") ... | eval person = json_object("name", "John", "age", 30), all_match = filter(person, k,v -> v = "John") ... | eval person = json_object("name", "John", "age", 30), all_match = filter(person, k,v -> k = "name" and v = "John")

@qianheng-aws #864 introduces array filter, seems map_filter is another lambda function we need either.

@qianheng-aws
Copy link
Contributor

qianheng-aws commented Nov 5, 2024

Hello @YANG-DB , can we also add support to filter a json object using either the key or the value, or both? E.g.
... | eval person = json_object("name", "John", "age", 30), all_match = filter(person, k,v -> k = "name") ... | eval person = json_object("name", "John", "age", 30), all_match = filter(person, k,v -> v = "John") ... | eval person = json_object("name", "John", "age", 30), all_match = filter(person, k,v -> k = "name" and v = "John")

@qianheng #864 introduces array filter, seems map_filter is another lambda function we need either.

@LantaoJin map_filter only accepts Map as its argument in Spark.

@A-Gray-Cat For your purpose of performing filter on a field of json(i.e StructType in Spark), you can do it directly, like:

... | eval person = json_object("name", "John", "age", 30), all_match = person.name = "John"

Or if you want to filter it directly, write query like:

... | eval person = json_object("name", "John", "age", 30) | where person.name = "John"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.6 enhancement New feature or request Lang:PPL Pipe Processing Language support
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants