Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantics of the list form of descendant selector #232

Closed
glyn opened this issue Aug 6, 2022 · 8 comments · Fixed by #233
Closed

Semantics of the list form of descendant selector #232

glyn opened this issue Aug 6, 2022 · 8 comments · Fixed by #233

Comments

@glyn
Copy link
Collaborator

glyn commented Aug 6, 2022

The semantics (as of 2022-08-06) for the descendant selector involving a list:

  • Do not state which descendants are selected.
  • Order selected items contrary to current implementations.
  • Lead to contradictions over the ordering of duplicate nodes.

Which descendants are selected

It is currently implicit that the list selector should select descendants based on the list semantics. This should be made explicit.

Ordering vs implementation consensus

An example from the JSONPath comparison project applies the path $..['c', 'd'] to the value:

[
 {"c":"cc1","d":"dd1","e":"ee1"},
 {"c": "cc2", "child": {"d": "dd2"}},
 {"c": "cc3"},
 {"d": "dd4"},
 {"child": {"c": "cc5"}}
]

The spec states:

The resultant nodelist of a descendant-selector applied to a node must be a sub-sequence of an array-sequenced preorder of the descendants of the node.

Because of the ordering of the array, in all array-sequenced preorders of the above value, "cc1", "dd1", and "ee1" appear before "cc2" and "dd2" which appear before "cc3" which appears before "dd4" which appears before "cc5". So the order of the result according to the spec is:

["cc1", "dd1", "cc2", "dd2", "cc3", "dd4", "cc5"]

However, the current consensus according to the comparison project is:

["cc1", "cc2", "cc3", "cc5", "dd1", "dd2", "dd4"]

where it seems the value is traversed once per item of the list (['c', 'd']).

Ordering of duplicate nodes

Consider the path $..[0, 1, 0] applied to the value:

[[1, 2]]

There is just one array-sequenced preorder of this value: [[1, 2], 1, 2] (noting the clarification below). But the current list semantics produces duplicates and can reorder the elements of an array, so the resultant nodelist would be:

[1, 2, 1]

which is not a sub-sequence of [[1, 2], 1, 2].

Clarification of array-sequenced preorder

We should make it clear that an array-sequenced preorder is of nodes rather than values.

For example, consider the path $..[0] applied to the value:

[1, [2], 1]

The array-sequenced preorder of this value is [1, [2], 2, 1] and the values selected by $..[0] are 1 and 2, but the nodelist [2, 1] is not a valid result even though it is a subsequence of the values in the array-sequenced preorder. The result is [1, 2] which is a subsequence of the nodes in the array-sequenced preorder. (In fact, using an array of values to describe an array-sequenced preorder is misleading because it ignores the nodes' locations within the argument value.)

@glyn glyn changed the title Ordering of the list form of descendant selector Semantics of the list form of descendant selector Aug 7, 2022
@glyn
Copy link
Collaborator Author

glyn commented Aug 7, 2022

I propose the following actions:

  1. Clarify which nodes are selected.
  2. Preserve the current ordering, regardless of the implementation consensus, since it easily implemented via a single traversal of the descendants.
  3. Allow the descendant list selector to reorder a subsequence of the nodes in an array-sequenced preorder, although it must honour the order of the elements in the selector's list.
  4. Clarify the definition of array-sequenced preorder.

Any objections?

@danielaparker
Copy link

danielaparker commented Aug 7, 2022

However, the current consensus according to the comparison project is:

["cc1", "cc2", "cc3", "cc5", "dd1", "dd2", "dd4"]

I don't think you can tell about the ordering from this result, the query is flagged with footnote 4, see (https://cburgmer.github.io/json-path-comparison/#union_with_keys_after_recursive_descent), which indicates that these results are being sorted before comparison.

A quick check shows that a number of implementations that fall in the consensus (including Goessner JavaScript) actually return

["cc1","dd1","cc2","dd2","cc3","dd4","cc5"]

@glyn
Copy link
Collaborator Author

glyn commented Aug 7, 2022

@danielaparker well spotted (and thanks for raising cburgmer/json-path-comparison#118). So that leaves the following proposed actions:

  1. Clarify which nodes are selected.
  2. Allow the descendant list selector to reorder a subsequence of the nodes in an array-sequenced preorder, although it must honour the order of the elements in the selector's list.
  3. Clarify the definition of array-sequenced preorder.

@danielaparker
Copy link

danielaparker commented Aug 7, 2022

  1. Allow the descendant list selector to reorder a subsequence of the nodes in an array-sequenced preorder, although it must honour the order of the elements in the selector's list.
  2. Clarify the definition of array-sequenced preorder.

That sounds complicated :-) One thing to keep in mind is that implementations that return

["cc1","dd1","cc2","dd2","cc3","dd4","cc5"]

do not do anything complicated, nor do they "reorder a subsequence of the nodes in an array-sequenced preorder". Rather, they simply apply the selector ['c', 'd'] to each node in the provided node list, and then (for the recursion operator) apply the same logic recursively on the children of each node. That description alone implies the above result.

@glyn
Copy link
Collaborator Author

glyn commented Aug 7, 2022

I agree it sounds complicated. Perhaps, along the lines you describe, it is best to describe the possible orderings of descendants and then say how particular descendant selectors are apply to those descendants.

@danielaparker
Copy link

danielaparker commented Aug 7, 2022

I agree it sounds complicated. Perhaps, along the lines you describe, it is best to describe the possible orderings of descendants and then say how particular descendant selectors are apply to those descendants.

It may be helpful to think of the recursion operator .. as an operation that maps the current result list into a new result list according to the following rules:

First, create a new empty result list. Then

  • Iterate over the items of the current result list
  • If the current item is an array or object, add it to the end of the new result list.
  • If the current item is an array, iterate over its elements and apply these rules recursively to each element.
  • If the current item is an object, iterate over its members and apply these rules recursively to each member value .

This becomes the new current result list.

So if the current result list contains the root value

[
 {"c":"cc1","d":"dd1","e":"ee1"},
 {"c": "cc2", "child": {"d": "dd2"}},
 {"c": "cc3"},
 {"d": "dd4"},
 {"child": {"c": "cc5"}}
]

the recursive operation produces

[
 [
  {"c":"cc1","d":"dd1","e":"ee1"},
  {"c": "cc2", "child": {"d": "dd2"}},
  {"c": "cc3"},
  {"d": "dd4"},
  {"child": {"c": "cc5"}}
 ],
 {"c":"cc1","d":"dd1","e":"ee1"},
 {"c": "cc2", "child": {"d": "dd2"}},
 {"d": "dd2"},
 {"c": "cc3"},
 {"d": "dd4"},
 {"child": {"c": "cc5"}},
{"c": "cc5"}
]

Then applying the selector ['c', 'd'] to each element in the new current result list produces

[
 "cc1",
 "dd1",
 "cc2",
 "dd2",
 "cc3",
 "dd4",
 "cc5"
]

@glyn
Copy link
Collaborator Author

glyn commented Aug 8, 2022

I'd prefer to define the orderings of the descendants in terms of a postcondition rather than an algorithm because that avoids implementation bias - a useful property for a spec. Sometimes the postcondition form really doesn't work and we have to resort to an algorithm, for example in the slice semantics. Let's see how the postcondition works out in a PR...

@glyn
Copy link
Collaborator Author

glyn commented Aug 8, 2022

Oh and I would say that describing the .. part of the descendant selector as a recursion operator is another example of implementation bias (since implementations need not use recursion). The spec talks about descendant selectors because this term describes what the selectors are rather than how they are implemented.

glyn added a commit that referenced this issue Aug 8, 2022
glyn added a commit that referenced this issue Aug 20, 2022
glyn added a commit that referenced this issue Aug 22, 2022
@glyn glyn closed this as completed in #233 Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants