-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual Sort field for automatic tie-breaking #56828
Comments
Pinging @elastic/es-search (:Search/Search) |
This is a great idea! Is the idea that a user needs to explicitly provide this sort field in their request: Or that when doing a search sort with |
I wonder the same as Mayya, maybe we could have a good tie breaker by default that wouldn't require to expose a virtual field? Index UUID and shard ID are the same on all documents of a shard, so Lucene's default tie-breaker (docID) would do the right thing, so maybe we would only have to change how hits are merged on the coordinating node and we could provide consistent ordering with negligible overhead? |
I agree that it would be nice to add the tiebreaker automatically but it needs to be materialized in the |
This change generates a tiebreaker automatically for sorted queries that are executed under a PIT (point in time reader). This allows to paginate consistently over the matching documents without requiring to provide a sort criteria that is unique per document. The tiebreaker is automatically added as the last sort values of the search hits in the response. It is then used by `search_after` to ensure that pagination will not miss any documents and that each document will appear only once. This commit also allows queries sorted by internal Lucene id (`_doc`) to be optimized if they are executed under a PIT the same way than scroll queries. Closes elastic#56828
This change ensures that the shard index that is used to tiebreak documents with identical sort remains consistent between two requests that target the same shards. The index is now always computed from the natural order of the shards in the search request. This change also adds the consistent shard index to the ShardSearchRequest. That allows the slice builder to use this information to build more balanced slice query. Relates elastic#56828
* Adds a consistent shard index to ShardSearchRequest This change ensures that the shard index that is used to tiebreak documents with identical sort remains consistent between two requests that target the same shards. The index is now always computed from the natural order of the shards in the search request. This change also adds the consistent shard index to the ShardSearchRequest. That allows the slice builder to use this information to build more balanced slice query. Relates #56828
This change ensures that the shard index that is used to tiebreak documents with identical sort remains consistent between two requests that target the same shards. The index is now always computed from the natural order of the shards in the search request. This change also adds the consistent shard index to the ShardSearchRequest. That allows the slice builder to use this information to build more balanced slice query. Relates #56828
This commit introduces a new sort field called `_shard_doc` that can be used in conjunction with a PIT to consistently tiebreak identical sort values. The sort value is a numeric long that is composed of the ordinal of the shard (assigned by the coordinating node) and the internal Lucene document ID. These two values are consistent within a PIT so this sort criteria can be used as the tiebreaker of any search requests. Since this sort criteria is stable we'd like to add it automatically to any sorted search requests that use a PIT but we also need to expose it explicitly in order to be able to: * Reverse the order of the tiebreaking, useful to search "before" `search_after`. * Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8. I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big. Relates elastic#56828
This commit introduces a new sort field called `_shard_doc` that can be used in conjunction with a PIT to consistently tiebreak identical sort values. The sort value is a numeric long that is composed of the ordinal of the shard (assigned by the coordinating node) and the internal Lucene document ID. These two values are consistent within a PIT so this sort criteria can be used as the tiebreaker of any search requests. Since this sort criteria is stable we'd like to add it automatically to any sorted search requests that use a PIT but we also need to expose it explicitly in order to be able to: * Reverse the order of the tiebreaking, useful to search "before" `search_after`. * Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8. I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big. Relates #56828
This commit introduces a new sort field called `_shard_doc` that can be used in conjunction with a PIT to consistently tiebreak identical sort values. The sort value is a numeric long that is composed of the ordinal of the shard (assigned by the coordinating node) and the internal Lucene document ID. These two values are consistent within a PIT so this sort criteria can be used as the tiebreaker of any search requests. Since this sort criteria is stable we'd like to add it automatically to any sorted search requests that use a PIT but we also need to expose it explicitly in order to be able to: * Reverse the order of the tiebreaking, useful to search "before" `search_after`. * Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8. I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big. Relates #56828
This PR adds the special `_shard_doc` sort tiebreaker automatically to any search requests that use a PIT. Adding the tiebreaker ensures that any sorted query can be paginated consistently within a PIT. Closes elastic#56828
This PR adds the special `_shard_doc` sort tiebreaker automatically to any search requests that use a PIT. Adding the tiebreaker ensures that any sorted query can be paginated consistently within a PIT. Closes #56828
This PR adds the special `_shard_doc` sort tiebreaker automatically to any search requests that use a PIT. Adding the tiebreaker ensures that any sorted query can be paginated consistently within a PIT. Closes #56828
This commit ensures that the automatic tiebreaker `_shard_doc` does not disable sort optimization. Relates elastic#56828
This commit ensures that the automatic tiebreaker `_shard_doc` does not disable sort optimization. Relates #56828
This commit ensures that the automatic tiebreaker `_shard_doc` does not disable sort optimization. Relates #56828
This commit ensures that the automatic tiebreaker `_shard_doc` does not disable sort optimization. Relates #56828
The pagination of search requests using
search_after
require to use a tiebreaker that is unique per document. This is done automatically on sorted_scroll
queries by tie-breaking documents on the index/shardId/docID tuple. This tuple is not accessible to normal search requests so the other option is to copy the_id
of the document into a doc value field and use it as a tiebreaker.This solution is difficult to implement for solutions that are not in charge of indexation.
With the introduction of the
search context
for requests, we'll be able to paginate over a set of sorted results usingsearch_after
with the guarantee to see the same documents during the walk. Since the internal document id wouldn't change between requests, using the tuple that_scroll
queries use become possible.This issue proposes to expose a virtual sort field called
_tiebreak
(or any name that suits better). The field would be accessible as a sort criteria that can be used with asearch context
to ensure consistent ordering. The field would be composed of:The order of the composition should be discussed but the main goal is to allow consistent ordering using
search_after
without relying on manual operations at index-time.The text was updated successfully, but these errors were encountered: