Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce json path commons lib #11680

Merged
merged 5 commits into from
Apr 8, 2022
Merged

Introduce json path commons lib #11680

merged 5 commits into from
Apr 8, 2022

Conversation

cgardens
Copy link
Contributor

@cgardens cgardens commented Apr 2, 2022

What

With our heavy use of JSON, having a way query into JSON objects is valuable. The main case it has come up recently is being able to convert paths in JSONSchema into queries that we can use to query into configuration objects. For example if the json schema is something like:

"connectionSpecification": {
    "type": "object",
    "properties": {
      "username": {
        "type": "string"
      },
      "password": {
        "type": "string",
        "airbyte_secret": true
      }
    }
  }

We want to be able to say something like, give me all of the paths for a potential configuration object that matches this schema that is a secret. In this case it is just "$.password". We would then use that information to perform queries or replacements on the actual configuration object itself. e.g. { "username": "charles", "password": "hunter2" } => { "username": "charles", "password": "********" }. This gets more complicated as you have more complex JSON objects with nesting, arrays, etc. Right now we are pretty haphazard in solving this problem and have multiple implementations of how try to handle it. Introducing this will help consolidate it.

In order to do this we need some query language for json. The JSONPath specification does exactly that. More information about the specification can be found here. For those familiar with jq, JSONPath will be most recognizable as pretty much "that DSL that jq uses".

This PR introduces a java implementation of that specification (repository) called JsonPath. The goal of this PR is to introduce this library in such a way that JSONPath can be easily used by our developers without them having to be experts on that underlying library. Thus, I'm introducing common helper functions. I will be using these directly in subsequent PR for secrets improvements, but for now, I'm doing these as a separate PR for clarity.

Recommended reading order

  1. JsonPaths.java
  2. JsonPathsTest.java

* return a single value (see: {@link JsonPaths#getSingleValue(JsonNode, String)}). These should
* only be used if it is not possible for a query to return more than one value.
*/
public class JsonPaths {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see an argument for keeping this package private. I am not sure if this will mostly be used internally for the Jsons and JsonsSchemas helpers in the commons package versus actually being used in application code. Leaning a little towards starting package private and then opening up if it would be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly seems easier to make open later than it would be to pull back into the private side

@cgardens cgardens temporarily deployed to more-secrets April 2, 2022 18:58 Inactive
@cgardens cgardens temporarily deployed to more-secrets April 2, 2022 18:59 Inactive

class JsonPathsTest {

private static final String JSON = """
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java does this now? I missed this. <3 java 15.

});
}

// todo (cgardens) - this behavior is a little unintuitive, but based on the docs, there's not an
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how intuitive this is to others. was a bit weird for me but inclined to leave it as be.

build.gradle Outdated
@@ -272,6 +272,7 @@ subprojects {
implementation 'com.fasterxml.jackson.core:jackson-annotations'
implementation 'com.fasterxml.jackson.dataformat:jackson-dataformat-yaml'
implementation 'com.fasterxml.jackson.datatype:jackson-datatype-jsr310'
implementation 'com.jayway.jsonpath:json-path:2.7.0'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we make this package private then we might actually just put it in the commons dependencies instead of the deps in the root of the project so that only commons has access to it.

*
* @param jsonPath - path to validate
*/
public static void assertIsSingleReturnQuery(final String jsonPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been through the entire review but don't we want to do that on the arrays?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ignore, I have seen where it is being use.

return getInternal(GET_PATHS_CONFIGURATION, json, jsonPath)
.stream()
.map(JsonNode::asText)
.collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, we can use toList() in java 17

* @param jsonPath - path into the json object. must be in the format of JSONPath.
* @return all paths that are present that match the input query
*/
public static List<String> getPaths(final JsonNode json, final String jsonPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we enforce it being a Set? The path should be uniq

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually upon further thought i think the list helps avoid bugs due to not determinitic ordering behavior. added to the java docs to explain it.

* @param jsonPath - path into the json object. must be in the format of JSONPath.
* @param replacement - a json node to replace the current value at the jsonPath
*/
public static JsonNode replaceAtJsonNodeLoud(final JsonNode json, final String jsonPath, final JsonNode replacement) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the Loud suffix here, shouldn't we throw an exception if the path doesn't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benmoriceau maybe i am letting knowledge of how we are going to use this creep in. a common case is that we are going to get all the possible paths are are secrets from the jsonschema object. we are then going to see if each of those paths exist in the config object. if they do not exist then we simply want to do nothing (i.e. the oneOf case).

at that point we need a version of the method that fails loudly if the path doesn't exist and one that simply does nothing if the path doesn't exist. i think both are important and useful helpers. happy to adjust the naming convention. my intuition is the "loud" versions will be used less frequently than the none loud ones, which is why i chose to make that version more verbose.

try {
return replaceAtJsonNodeLoud(json, jsonPath, replacement);
} catch (final PathNotFoundException e) {
LOGGER.debug("Path not found", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a DEBUG log? Shouldn't we set it at least as a warn? It seems like a miss-usage of the library.

Copy link
Contributor

@benmoriceau benmoriceau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly nit comments.

}
}
} catch (final PathNotFoundException e) {
LOGGER.debug("Path not found", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment than the previous warn.


@Test
void testGetValues() {
assertEquals(List.of(0, 1, 2), JsonPaths.getValues(JSON_NODE, LIST_ALL_QUERY).stream().map(JsonNode::asInt).collect(Collectors.toList()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: AssertJ can provide a slightly easier syntax, for example:

List<Integer> test = new ArrayList<>();
      Assertions.assertThat(test).map(e -> e.toString())
              .contains("test", "test");

/**
* JSONPath is specification for querying JSON objects. More information about the specification can
* be found here: https://goessner.net/articles/JsonPath/. For those familiar with jq, JSONPath will
* be most recognizable as "that DSL that jq uses".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a helpful TLDR ☝️

Copy link
Contributor

@supertopher supertopher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

nits don't feel blocking to me

* return a single value (see: {@link JsonPaths#getSingleValue(JsonNode, String)}). These should
* only be used if it is not possible for a query to return more than one value.
*/
public class JsonPaths {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly seems easier to make open later than it would be to pull back into the private side

@cgardens cgardens temporarily deployed to more-secrets April 8, 2022 16:51 Inactive
@cgardens cgardens temporarily deployed to more-secrets April 8, 2022 16:51 Inactive
@cgardens cgardens temporarily deployed to more-secrets April 8, 2022 17:51 Inactive
@cgardens cgardens temporarily deployed to more-secrets April 8, 2022 17:51 Inactive
@cgardens cgardens merged commit f512208 into master Apr 8, 2022
@cgardens cgardens deleted the cgardens/json-path branch April 8, 2022 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants