-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add max_children limit to nested sort #33587
Conversation
Pinging @elastic/es-search-aggs |
1a7fb20
to
129b3c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @erayarslan , I left some comments regarding the scope of this new mode. It makes sense only when it is applied to select the first child document, not to select a value in a multi valued field since doc_values
do not preserve the original order. Can you add a validation in the SortBuilder that checks if this mode is used in conjunction with a nested field ? I also wonder what value should be picked inside the first child document in case it contains multiple values. Should we have a second mode
to pick the value ? Or should we accept this mode only if the nested document have a single value in the sort field ?
Also can you add a specific test in FieldSortIT that checks the behavior of this new mode ?
server/src/main/java/org/elasticsearch/search/MultiValueMode.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/MultiValueMode.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/MultiValueMode.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/MultiValueMode.java
Outdated
Show resolved
Hide resolved
129b3c9
to
2ea5bff
Compare
Thanks @jimczi , I implemented what you describe. Currently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @erayarslan , I wonder if we should decorrelate the MultiValueMode
from the selection of the children. With this pr the first
mode would select the first value inside the first children that matches. However as explained before first
only makes sense for the selection of the children, not the values on each child since they are reordered internally. I wonder if we could add an option in the NestedSortBuilder
instead and change the signature of MultiValueMode#pick
to:
pick(SortedNumericDoubleValues values, double missingValue, DocIdSetIterator docItr, int startDoc, int endDoc, int maxChildren)
where maxChildren
indicates the number of child that we should consider on each parent document. A value of 1
would select the first child document that matches.
The full sort would look like:
"sort" : [
{
"offer.price" : {
"mode" : "avg",
"order" : "asc",
"nested": {
"max_children": 1,
"path": "offer",
"filter": {
"term" : { "offer.color" : "blue" }
}
}
}
}
]
What do you think ?
Thank you for clarification @jimczi Btw sorry for misunderstand. I was focused completely on the So |
Yes if we agree that it's a better solution ;). |
2ea5bff
to
ef0610c
Compare
Hi @jimczi , I implemented And I have some concerns.
Because of these; I used two loop in new implementation.
Total: I think What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @erayarslan . I like the max_children
option better. I left more comments regarding the implementation.
server/src/main/java/org/elasticsearch/search/MultiValueMode.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/MultiValueMode.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/MultiValueMode.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/MultiValueMode.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/sort/NestedSortBuilder.java
Outdated
Show resolved
Hide resolved
7276bd0
to
8bb867a
Compare
Thanks for review @jimczi , I implemented last change request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes. I left more comment but I think it's getting close. However I'd expect to see explicit test for this new feature. Can you add one in MultiValueModeTests and maybe one in FieldSortIT ? The tricky part is to document this new feature correctly, we need to come up with a description that is easy to understand and reason about. I'd also like to get some feedbacks from others because we changed the initial implementation to cope with the internals of nested document indexing. @jpountz what do you think of the current approach ? I wrongly assumed that we respect nested document order when indexing but I forgot that we reverse this ordering to ensure that parents always come last. See #33587 (comment) for the context of the discussion and the possible solution that we discussed.
server/src/main/java/org/elasticsearch/search/sort/NestedSortBuilder.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/sort/NestedSortBuilder.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/search/sort/NestedSortBuilder.java
Outdated
Show resolved
Hide resolved
I left tests to the last. After everything is finalized. Thank you and can you please take a look to the last changes? |
339e629
to
cc7964b
Compare
Sorry for the late reply @erayarslan . We discussed internally with @jpountz and we think that we can preserve the order of the nested documents at index time in 6.x. Currently we reverse the order of all nested documents but we just need to put parents after their children so technically we could preserve the order of the children and only reorder the parent. If we implement this the |
Today we reverse the initial order of the nested documents when we index them in order to ensure that parents documents appear after their children. This means that a query will always match nested documents in the reverse order of their offsets in the source document. Reversing all documents is not needed so this change ensures that parents documents appear after their children without modifying the initial order in each nested level. This allows to match children in the order of their appearance in the source document which is a requirement to efficiently implement elastic#33587. Old indices created before this change will continue to reverse the order of nested documents to ensure backwark compatibility.
Today we reverse the initial order of the nested documents when we index them in order to ensure that parents documents appear after their children. This means that a query will always match nested documents in the reverse order of their offsets in the source document. Reversing all documents is not needed so this change ensures that parents documents appear after their children without modifying the initial order in each nested level. This allows to match children in the order of their appearance in the source document which is a requirement to efficiently implement #33587. Old indices created before this change will continue to reverse the order of nested documents to ensure backwark compatibility.
Today we reverse the initial order of the nested documents when we index them in order to ensure that parents documents appear after their children. This means that a query will always match nested documents in the reverse order of their offsets in the source document. Reversing all documents is not needed so this change ensures that parents documents appear after their children without modifying the initial order in each nested level. This allows to match children in the order of their appearance in the source document which is a requirement to efficiently implement #33587. Old indices created before this change will continue to reverse the order of nested documents to ensure backwark compatibility.
@erayarslan I merged #34225 so indices created in 6.5.0 will preserve the original order of nested documents. Can you merge with master and update this pr to reflect the new logic ? The idea is to throw an exception if
|
8a3e973
to
4d9243b
Compare
I fixed integration test @jimczi |
@elasticmachine test this please |
* master: Rename CCR stats implementation (elastic#34300) Add max_children limit to nested sort (elastic#33587) MINOR: Remove Dead Code from Netty4Transport (elastic#34134) Rename clsuterformation -> testclusters (elastic#34299) [Build] make sure there are no duplicate classes in third party audit (elastic#34213) BWC Build: Read CI properties to determine java version (elastic#34295) [DOCS] Fix typo and add [float] Allow User/Password realms to disable authc (elastic#34033) Enable security automaton caching (elastic#34028) Preserve thread context during authentication. (elastic#34290) [ML] Allow asynchronous job deletion (elastic#34058)
Add an option to `nested` sort to limit the number of children to visit when picking the sort value of the root document. Closes #33592
I merged in master and 6x, thanks for all the iterations @erayarslan ! |
* master: (63 commits) [Build] randomizedtesting: Allow property values to be closures (elastic#34319) Feature/hlrc ml docs cleanup (elastic#34316) Docs: DRY up CRUD docs (elastic#34203) Minor corrections in geo-queries.asciidoc (elastic#34314) [DOCS] Remove beta label from normalizers (elastic#34326) Adjust size of BigArrays in circuit breaker test Adapt bwc version after backport Follow stats structure (elastic#34301) Rename CCR stats implementation (elastic#34300) Add max_children limit to nested sort (elastic#33587) MINOR: Remove Dead Code from Netty4Transport (elastic#34134) Rename clsuterformation -> testclusters (elastic#34299) [Build] make sure there are no duplicate classes in third party audit (elastic#34213) BWC Build: Read CI properties to determine java version (elastic#34295) [DOCS] Fix typo and add [float] Allow User/Password realms to disable authc (elastic#34033) Enable security automaton caching (elastic#34028) Preserve thread context during authentication. (elastic#34290) [ML] Allow asynchronous job deletion (elastic#34058) HLRC: ML Adding get datafeed stats API (elastic#34271) ...
Thank you for all your help. @jimczi |
Today we reverse the initial order of the nested documents when we index them in order to ensure that parents documents appear after their children. This means that a query will always match nested documents in the reverse order of their offsets in the source document. Reversing all documents is not needed so this change ensures that parents documents appear after their children without modifying the initial order in each nested level. This allows to match children in the order of their appearance in the source document which is a requirement to efficiently implement #33587. Old indices created before this change will continue to reverse the order of nested documents to ensure backwark compatibility.
Add an option to `nested` sort to limit the number of children to visit when picking the sort value of the root document. Closes #33592
FEATURE BRANCH
related with (#33592)