Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max_children limit to nested sort #33587

Merged
merged 10 commits into from
Oct 5, 2018

Conversation

erayarslan
Copy link
Contributor

@erayarslan erayarslan commented Sep 10, 2018

FEATURE BRANCH

  • First SortMode added (including nested fields)

related with (#33592)

@erayarslan erayarslan changed the title first SortMode added for numeric and date fields First SortMode added for numeric and date fields Sep 11, 2018
@erayarslan erayarslan changed the title First SortMode added for numeric and date fields First SortMode added for Numeric, Date and Geo Fields Sep 11, 2018
@colings86 colings86 added >feature :Search/Search Search-related issues that do not fall into other categories team-discuss labels Sep 11, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@erayarslan erayarslan force-pushed the feature/first-sortmode branch from 1a7fb20 to 129b3c9 Compare September 18, 2018 11:09
@erayarslan erayarslan changed the title First SortMode added for Numeric, Date and Geo Fields First SortMode added Sep 18, 2018
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @erayarslan , I left some comments regarding the scope of this new mode. It makes sense only when it is applied to select the first child document, not to select a value in a multi valued field since doc_values do not preserve the original order. Can you add a validation in the SortBuilder that checks if this mode is used in conjunction with a nested field ? I also wonder what value should be picked inside the first child document in case it contains multiple values. Should we have a second mode to pick the value ? Or should we accept this mode only if the nested document have a single value in the sort field ?
Also can you add a specific test in FieldSortIT that checks the behavior of this new mode ?

@erayarslan erayarslan force-pushed the feature/first-sortmode branch from 129b3c9 to 2ea5bff Compare September 21, 2018 11:19
@erayarslan
Copy link
Contributor Author

Thanks @jimczi , I implemented what you describe.

Currently FIRST mode only for nested fields. It does not include multi valued field.
Can you please take a look?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @erayarslan , I wonder if we should decorrelate the MultiValueMode from the selection of the children. With this pr the first mode would select the first value inside the first children that matches. However as explained before first only makes sense for the selection of the children, not the values on each child since they are reordered internally. I wonder if we could add an option in the NestedSortBuilder instead and change the signature of MultiValueMode#pick to:

pick(SortedNumericDoubleValues values, double missingValue, DocIdSetIterator docItr, int startDoc, int endDoc, int maxChildren)

where maxChildren indicates the number of child that we should consider on each parent document. A value of 1 would select the first child document that matches.
The full sort would look like:

"sort" : [
       {
          "offer.price" : {
             "mode" :  "avg",
             "order" : "asc",
             "nested": {
                 "max_children": 1,
                "path": "offer",
                "filter": {
                   "term" : { "offer.color" : "blue" }
                }
             }
          }
       }
    ]

What do you think ?

@erayarslan
Copy link
Contributor Author

erayarslan commented Sep 21, 2018

Thank you for clarification @jimczi

Btw sorry for misunderstand. I was focused completely on the FIRST Enum SortMode.
This also works, i think.

So FIRST enum will be deleted and every nested MultiValueMode#pick method will get maxChildren argument? Am i right?

@jimczi
Copy link
Contributor

jimczi commented Sep 21, 2018

So FIRST enum well be deleted and every nested MultiValueMode#pick method will get maxChildren argument? Am i right?

Yes if we agree that it's a better solution ;).

@erayarslan erayarslan force-pushed the feature/first-sortmode branch from 2ea5bff to ef0610c Compare September 24, 2018 10:23
@erayarslan
Copy link
Contributor Author

erayarslan commented Sep 24, 2018

Hi @jimczi ,

I implemented max_children solution. Can you please take a look?

And I have some concerns.

  • DocIdSetIterator iteration working reversed (i can't slice documents with max_children on iteration)
  • I don't know how many documents are in DocIdSetIterator. (startDoc, endDoc arguments or DocIdSetIterator#cost() method cannot estimate when filter used)

Because of these;

I used two loop in new implementation.

  • First loop for populating documents for slice documents with max_children value. (n times)
  • Second loop for SortMode calculations after sliced documents. (max_children times)

Total: n + max_children times

I think FIRST Mode solution better for performance but this method more generic.

What do you think?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @erayarslan . I like the max_children option better. I left more comments regarding the implementation.

@erayarslan erayarslan force-pushed the feature/first-sortmode branch 2 times, most recently from 7276bd0 to 8bb867a Compare September 26, 2018 09:50
@erayarslan
Copy link
Contributor Author

Thanks for review @jimczi ,

I implemented last change request.
Please take a look.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes. I left more comment but I think it's getting close. However I'd expect to see explicit test for this new feature. Can you add one in MultiValueModeTests and maybe one in FieldSortIT ? The tricky part is to document this new feature correctly, we need to come up with a description that is easy to understand and reason about. I'd also like to get some feedbacks from others because we changed the initial implementation to cope with the internals of nested document indexing. @jpountz what do you think of the current approach ? I wrongly assumed that we respect nested document order when indexing but I forgot that we reverse this ordering to ensure that parents always come last. See #33587 (comment) for the context of the discussion and the possible solution that we discussed.

@erayarslan
Copy link
Contributor Author

I left tests to the last. After everything is finalized.
I think we agreed on this solution and it can solve our business problem too.

Thank you and can you please take a look to the last changes?

@erayarslan erayarslan force-pushed the feature/first-sortmode branch from 339e629 to cc7964b Compare October 1, 2018 06:31
@jimczi
Copy link
Contributor

jimczi commented Oct 1, 2018

Sorry for the late reply @erayarslan . We discussed internally with @jpountz and we think that we can preserve the order of the nested documents at index time in 6.x. Currently we reverse the order of all nested documents but we just need to put parents after their children so technically we could preserve the order of the children and only reorder the parent. If we implement this the max_children option could work on the initial ordering if we limit it to indices created after this change.
I'll try to implement something in the coming days and I'll ping you when it's ready to see how we can integrate it in the current pr.

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Oct 2, 2018
Today we reverse the initial order of the nested documents when we
index them in order to ensure that parents documents appear after
their children. This means that a query will always match nested documents
in the reverse order of their offsets in the source document.
Reversing all documents is not needed so this change ensures that parents
documents appear after their children without modifying the initial order
in each nested level. This allows to match children in the order of their
appearance in the source document which is a requirement to efficiently
implement elastic#33587. Old indices created before this change will continue
to reverse the order of nested documents to ensure backwark compatibility.
jimczi added a commit that referenced this pull request Oct 3, 2018
Today we reverse the initial order of the nested documents when we
index them in order to ensure that parents documents appear after
their children. This means that a query will always match nested documents
in the reverse order of their offsets in the source document.
Reversing all documents is not needed so this change ensures that parents
documents appear after their children without modifying the initial order
in each nested level. This allows to match children in the order of their
appearance in the source document which is a requirement to efficiently
implement #33587. Old indices created before this change will continue
to reverse the order of nested documents to ensure backwark compatibility.
jimczi added a commit that referenced this pull request Oct 3, 2018
Today we reverse the initial order of the nested documents when we
index them in order to ensure that parents documents appear after
their children. This means that a query will always match nested documents
in the reverse order of their offsets in the source document.
Reversing all documents is not needed so this change ensures that parents
documents appear after their children without modifying the initial order
in each nested level. This allows to match children in the order of their
appearance in the source document which is a requirement to efficiently
implement #33587. Old indices created before this change will continue
to reverse the order of nested documents to ensure backwark compatibility.
@jimczi
Copy link
Contributor

jimczi commented Oct 3, 2018

@erayarslan I merged #34225 so indices created in 6.5.0 will preserve the original order of nested documents. Can you merge with master and update this pr to reflect the new logic ? The idea is to throw an exception if max_children is used on indices created before 6.5.0. You should be able to test this in FieldSortBuilder#build where you can check the created version of the current shard with:

if (context.indexVersionCreated().before(Version.V_6_5_0)) {
....
}

@erayarslan erayarslan force-pushed the feature/first-sortmode branch from 8a3e973 to 4d9243b Compare October 4, 2018 15:32
@erayarslan
Copy link
Contributor Author

I fixed integration test @jimczi
Can you trigger ci?

@jimczi
Copy link
Contributor

jimczi commented Oct 4, 2018

@elasticmachine test this please

@jimczi jimczi merged commit daf8833 into elastic:master Oct 5, 2018
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Oct 5, 2018
* master:
  Rename CCR stats implementation (elastic#34300)
  Add max_children limit to nested sort (elastic#33587)
  MINOR: Remove Dead Code from Netty4Transport (elastic#34134)
  Rename clsuterformation -> testclusters (elastic#34299)
  [Build] make sure there are no duplicate classes in third party audit (elastic#34213)
  BWC Build: Read CI properties to determine java version (elastic#34295)
  [DOCS] Fix typo and add [float]
  Allow User/Password realms to disable authc (elastic#34033)
  Enable security automaton caching (elastic#34028)
  Preserve thread context during authentication. (elastic#34290)
  [ML] Allow asynchronous job deletion (elastic#34058)
jimczi pushed a commit that referenced this pull request Oct 5, 2018
Add an option to `nested` sort to limit the number of children to visit when picking the sort value
of the root document. 

Closes #33592
jimczi added a commit that referenced this pull request Oct 5, 2018
jimczi added a commit that referenced this pull request Oct 5, 2018
@jimczi
Copy link
Contributor

jimczi commented Oct 5, 2018

I merged in master and 6x, thanks for all the iterations @erayarslan !

jasontedor added a commit to martijnvg/elasticsearch that referenced this pull request Oct 5, 2018
* master: (63 commits)
  [Build] randomizedtesting: Allow property values to be closures (elastic#34319)
  Feature/hlrc ml docs cleanup (elastic#34316)
  Docs: DRY up CRUD docs (elastic#34203)
  Minor corrections in geo-queries.asciidoc (elastic#34314)
  [DOCS] Remove beta label from normalizers (elastic#34326)
  Adjust size of BigArrays in circuit breaker test
  Adapt bwc version after backport
  Follow stats structure (elastic#34301)
  Rename CCR stats implementation (elastic#34300)
  Add max_children limit to nested sort (elastic#33587)
  MINOR: Remove Dead Code from Netty4Transport (elastic#34134)
  Rename clsuterformation -> testclusters (elastic#34299)
  [Build] make sure there are no duplicate classes in third party audit (elastic#34213)
  BWC Build: Read CI properties to determine java version (elastic#34295)
  [DOCS] Fix typo and add [float]
  Allow User/Password realms to disable authc (elastic#34033)
  Enable security automaton caching (elastic#34028)
  Preserve thread context during authentication. (elastic#34290)
  [ML] Allow asynchronous job deletion (elastic#34058)
  HLRC: ML Adding get datafeed stats API (elastic#34271)
  ...
@erayarslan erayarslan deleted the feature/first-sortmode branch October 6, 2018 13:33
@erayarslan
Copy link
Contributor Author

Thank you for all your help. @jimczi

kcm pushed a commit that referenced this pull request Oct 30, 2018
Today we reverse the initial order of the nested documents when we
index them in order to ensure that parents documents appear after
their children. This means that a query will always match nested documents
in the reverse order of their offsets in the source document.
Reversing all documents is not needed so this change ensures that parents
documents appear after their children without modifying the initial order
in each nested level. This allows to match children in the order of their
appearance in the source document which is a requirement to efficiently
implement #33587. Old indices created before this change will continue
to reverse the order of nested documents to ensure backwark compatibility.
kcm pushed a commit that referenced this pull request Oct 30, 2018
Add an option to `nested` sort to limit the number of children to visit when picking the sort value
of the root document. 

Closes #33592
kcm pushed a commit that referenced this pull request Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search/Search Search-related issues that do not fall into other categories v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants