Virtual Sort field for automatic tie-breaking #56828

jimczi · 2020-05-15T15:11:48Z

The pagination of search requests using search_after require to use a tiebreaker that is unique per document. This is done automatically on sorted _scroll queries by tie-breaking documents on the index/shardId/docID tuple. This tuple is not accessible to normal search requests so the other option is to copy the _id of the document into a doc value field and use it as a tiebreaker.
This solution is difficult to implement for solutions that are not in charge of indexation.
With the introduction of the search context for requests, we'll be able to paginate over a set of sorted results using search_after with the guarantee to see the same documents during the walk. Since the internal document id wouldn't change between requests, using the tuple that _scroll queries use become possible.
This issue proposes to expose a virtual sort field called _tiebreak (or any name that suits better). The field would be accessible as a sort criteria that can be used with a search context to ensure consistent ordering. The field would be composed of:

The index UUID
The shard ID
The internal document ID

The order of the composition should be discussed but the main goal is to allow consistent ordering using search_after without relying on manual operations at index-time.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-05-15T15:11:50Z

Pinging @elastic/es-search (:Search/Search)

mayya-sharipova · 2020-05-15T20:49:44Z

This is a great idea!

Is the idea that a user needs to explicitly provide this sort field in their request: "sort": ["my_date", "_tiebreak"]?

Or that when doing a search sort with search_context, elasticsearch will automatically rewrite sort to add this field as tie break?

jpountz · 2020-05-20T11:59:31Z

I wonder the same as Mayya, maybe we could have a good tie breaker by default that wouldn't require to expose a virtual field? Index UUID and shard ID are the same on all documents of a shard, so Lucene's default tie-breaker (docID) would do the right thing, so maybe we would only have to change how hits are merged on the coordinating node and we could provide consistent ordering with negligible overhead?

jimczi · 2020-05-20T12:06:56Z

I agree that it would be nice to add the tiebreaker automatically but it needs to be materialized in the sort values of the response. This is useful only for search_after queries so we rely on users to provide this value when they paginate.

This change generates a tiebreaker automatically for sorted queries that are executed under a PIT (point in time reader). This allows to paginate consistently over the matching documents without requiring to provide a sort criteria that is unique per document. The tiebreaker is automatically added as the last sort values of the search hits in the response. It is then used by `search_after` to ensure that pagination will not miss any documents and that each document will appear only once. This commit also allows queries sorted by internal Lucene id (`_doc`) to be optimized if they are executed under a PIT the same way than scroll queries. Closes elastic#56828

This change ensures that the shard index that is used to tiebreak documents with identical sort remains consistent between two requests that target the same shards. The index is now always computed from the natural order of the shards in the search request. This change also adds the consistent shard index to the ShardSearchRequest. That allows the slice builder to use this information to build more balanced slice query. Relates elastic#56828

* Adds a consistent shard index to ShardSearchRequest This change ensures that the shard index that is used to tiebreak documents with identical sort remains consistent between two requests that target the same shards. The index is now always computed from the natural order of the shards in the search request. This change also adds the consistent shard index to the ShardSearchRequest. That allows the slice builder to use this information to build more balanced slice query. Relates #56828

This change ensures that the shard index that is used to tiebreak documents with identical sort remains consistent between two requests that target the same shards. The index is now always computed from the natural order of the shards in the search request. This change also adds the consistent shard index to the ShardSearchRequest. That allows the slice builder to use this information to build more balanced slice query. Relates #56828

This commit introduces a new sort field called `_shard_doc` that can be used in conjunction with a PIT to consistently tiebreak identical sort values. The sort value is a numeric long that is composed of the ordinal of the shard (assigned by the coordinating node) and the internal Lucene document ID. These two values are consistent within a PIT so this sort criteria can be used as the tiebreaker of any search requests. Since this sort criteria is stable we'd like to add it automatically to any sorted search requests that use a PIT but we also need to expose it explicitly in order to be able to: * Reverse the order of the tiebreaking, useful to search "before" `search_after`. * Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8. I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big. Relates elastic#56828

This commit introduces a new sort field called `_shard_doc` that can be used in conjunction with a PIT to consistently tiebreak identical sort values. The sort value is a numeric long that is composed of the ordinal of the shard (assigned by the coordinating node) and the internal Lucene document ID. These two values are consistent within a PIT so this sort criteria can be used as the tiebreaker of any search requests. Since this sort criteria is stable we'd like to add it automatically to any sorted search requests that use a PIT but we also need to expose it explicitly in order to be able to: * Reverse the order of the tiebreaking, useful to search "before" `search_after`. * Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8. I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big. Relates #56828

This PR adds the special `_shard_doc` sort tiebreaker automatically to any search requests that use a PIT. Adding the tiebreaker ensures that any sorted query can be paginated consistently within a PIT. Closes elastic#56828

This PR adds the special `_shard_doc` sort tiebreaker automatically to any search requests that use a PIT. Adding the tiebreaker ensures that any sorted query can be paginated consistently within a PIT. Closes #56828

This commit ensures that the automatic tiebreaker `_shard_doc` does not disable sort optimization. Relates elastic#56828

This commit ensures that the automatic tiebreaker `_shard_doc` does not disable sort optimization. Relates #56828

jimczi added >enhancement :Search/Search Search-related issues that do not fall into other categories labels May 15, 2020

elasticmachine added the Team:Search Meta label for search team label May 15, 2020

matriv self-assigned this May 26, 2020

dnhatn mentioned this issue Jun 4, 2020

Introduce search context - point in time view of indices #56480

Closed

jtibshirani mentioned this issue Aug 5, 2020

Consider storing _id through doc values. #60778

Closed

jimczi mentioned this issue Sep 2, 2020

[DOCS] Add PIT to search after docs #61593

Merged

dnhatn mentioned this issue Oct 26, 2020

Add index commit id to searcher #63963

Merged

jtibshirani mentioned this issue Nov 2, 2020

Remove support for fielddata loading on _id. #64511

Open

jimczi unassigned matriv Nov 10, 2020

rudolf mentioned this issue Nov 20, 2020

[Alerting] Add a tie breaker field to alerts elastic/kibana#62002

Closed

jimczi mentioned this issue Nov 24, 2020

Automatic tie-breaking for sorted queries within a PIT #65450

Closed

jimczi mentioned this issue Dec 1, 2020

Adds a consistent shard index to ShardSearchRequest #65706

Merged

jimczi mentioned this issue Dec 9, 2020

Sort field tiebreaker for PIT (point in time) readers #66093

Merged

stevejgordon mentioned this issue Dec 17, 2020

7.11.0 Meta Ticket elastic/elasticsearch-net#5198

Closed

lukeelmers mentioned this issue Jan 27, 2021

Add search_after support to SO find API using PIT elastic/kibana#86301

Closed

jimczi mentioned this issue Feb 10, 2021

Add automatic tiebreaker for search requests that use a PIT #68833

Merged

jimczi closed this as completed in #68833 Feb 17, 2021

jimczi added a commit to jimczi/elasticsearch that referenced this issue Feb 22, 2021

Handle _shard_doc field for sort optimization

8c7e179

This commit ensures that the automatic tiebreaker `_shard_doc` does not disable sort optimization. Relates elastic#56828

jimczi mentioned this issue Feb 22, 2021

Handle _shard_doc field for sort optimization #69321

Merged

stevejgordon mentioned this issue Feb 22, 2021

7.12.0 Meta Ticket elastic/elasticsearch-net#5337

Closed

34 tasks

jimczi added a commit that referenced this issue Feb 22, 2021

Handle _shard_doc field for sort optimization (#69321)

f27da75

This commit ensures that the automatic tiebreaker `_shard_doc` does not disable sort optimization. Relates #56828

jimczi added a commit that referenced this issue Feb 22, 2021

Handle _shard_doc field for sort optimization (#69321)

bd4a585

This commit ensures that the automatic tiebreaker `_shard_doc` does not disable sort optimization. Relates #56828

jimczi added a commit that referenced this issue Feb 22, 2021

Handle _shard_doc field for sort optimization (#69321)

395bc05

This commit ensures that the automatic tiebreaker `_shard_doc` does not disable sort optimization. Relates #56828

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Virtual Sort field for automatic tie-breaking #56828

Virtual Sort field for automatic tie-breaking #56828

jimczi commented May 15, 2020 •

edited

Loading

elasticmachine commented May 15, 2020

mayya-sharipova commented May 15, 2020

jpountz commented May 20, 2020

jimczi commented May 20, 2020

Virtual Sort field for automatic tie-breaking #56828

Virtual Sort field for automatic tie-breaking #56828

Comments

jimczi commented May 15, 2020 • edited Loading

elasticmachine commented May 15, 2020

mayya-sharipova commented May 15, 2020

jpountz commented May 20, 2020

jimczi commented May 20, 2020

jimczi commented May 15, 2020 •

edited

Loading