Add ability to skip non-competitive documents on field sort [LUCENE-9280] #10320

asfimport · 2020-03-16T20:18:20Z

Today collectors, once they collect enough docs, can instruct scorers to update their iterators to skip non-competitive documents. This is applicable only for a case when we need top docs by _score.

It would be nice to also have an ability to skip non-competitive docs when we need top docs sorted by other fields different from _score.

Migrated from LUCENE-9280 by Mayya Sharipova (@mayya-sharipova), resolved Jul 07 2020
Linked issues:

Backport for the field sort optimization of LUCENE-9280 [LUCENE-9384] #10424

asfimport · 2020-03-16T20:21:13Z

Mayya Sharipova (@mayya-sharipova) (migrated from JIRA)

draft PR: apache/lucene-solr#1351

asfimport · 2020-03-22T18:48:49Z

Michael Sokolov (@msokolov) (migrated from JIRA)

This is interesting - could you provide a bit of motivation why this is desirable though? Since DocValues are stored in docid order, how will skipping work? I guess we know the min/max values for a block of docs, so could potentially skip over a block at a time?

asfimport · 2020-03-23T14:05:15Z

Mayya Sharipova (@mayya-sharipova) (migrated from JIRA)

> Since DocValues are stored in docid order, how will skipping work?

We have decided to do this similar how LongDistanceFeatureQuery works. The initial iterator for a collector is indeed a docValues, but then as we set the bottom value in the comparator, the iterator is updated from PointValues to include only docs that are lower than bottom value (in case of asc sort).

This is the most desirable for a case where docs were added sequentially with ever increasing values, e.g. a logging use case with an increasing date field. For example, if doc1 has a field1 value 1, doc2 – 2, doc3 – 3,.... doc1000000 – 1000000, and we needed to retrieve top 3 docs with smallest field1 values, currently we would iterate through all docs. With the proposed change, as soon as collect first 3 docs and set the bottom 3, the collector's iterator can be updated to remove all other docs as they are not competitive.

asfimport · 2020-03-31T07:46:15Z

Adrien Grand (@jpountz) (migrated from JIRA)

Woohoo. I can try to give a little more background about this feature. The question that usually comes first is the benefit of this approach compared to index sorting. Index sorting provides greater speedups than this change, but is limited to one field and one sort order. While it's possible to create multiple indexes that all have a different sort order, it's quite expensive in terms of resources, and creates new problems e.g. if you need all your indices to have a consistent view of the documents that exist in the index. Even if you only sort by a single field, this feature might be relevant. For instance with time series, it's common to sort data by descending timestamp, e.g. to live-tail events that match a filter, but you sometimes also need to sort in ascending order to find the first occurrence of a problem, which is often a useful piece of information for root cause analysis. With index sorting, sorting in reverse order of the index sort is a worst-case scenario.

The approach to store min/max values per block in doc values would work too. Something I like with this approach is that it doesn't require adding new APIs to doc values, and that it doesn't only work with numbers. For instance, this approach could also be used to speed up sorting by geo-distance too (using the same principle as LatLonPointDistanceFeatureQuery) without requiring the introduction of a new type of doc values that supports multiple dimensions.

asfimport · 2020-03-31T13:42:22Z

Mayya Sharipova (@mayya-sharipova) (migrated from JIRA)

@jpountz Thanks Adrien for the explanation about index sorting. It makes sense

I was just wondering about this sentence of yours: " it's common to sort data by descending timestamp, e.g. to live-tail events that match a filter, but you sometimes also need to sort in ascending order". I guess this PR is not going to help for this case either, as segments and docs within them are processed in order they are written (in desc order), and we will not reach bottom values until the very end. But I was wondering if as a follow-up work to address this case, we can imitate the work of @jimczi that during a query pre-sorts segments based on a query sort field, and process segments in that sort order.

asfimport · 2020-03-31T14:24:29Z

Adrien Grand (@jpountz) (migrated from JIRA)

I was just talking about it with Jim earlier today. I think this would make this change more complete indeed.

asfimport · 2020-04-20T14:16:28Z

David Smiley (@dsmiley) (migrated from JIRA)

Cool!

asfimport · 2020-06-23T20:05:10Z

ASF subversion and git services (migrated from JIRA)

Commit b0333ab in lucene-solr's branch refs/heads/master from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b0333ab

LUCENE-9280: Collectors to skip noncompetitive documents (#1351)

Similar how scorers can update their iterators to skip non-competitive
documents, collectors and comparators should also provide and update
iterators that allow them to skip non-competive documents.

asfimport · 2020-07-07T15:53:37Z

Michael McCandless (@mikemccand) (migrated from JIRA)

Can this be resolved? Will we backport to 8.x?

asfimport · 2020-07-07T19:38:57Z

Mayya Sharipova (@mayya-sharipova) (migrated from JIRA)

@mikemccand Thanks for checking it. I will resolve the issue. We have a separate issue for backporting to 8.x, as it requires introducing a new parameter for SortField.

asfimport · 2020-07-31T18:13:18Z

ASF subversion and git services (migrated from JIRA)

Commit 7b12849 in lucene-solr's branch refs/heads/branch_8x from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7b12849

#10424: Backport for field sort optimization (#1610)

Backport for: LUCENE-9280: Collectors to skip noncompetitive documents (#1351)

Similar how scorers can update their iterators to skip non-competitive
documents, collectors and comparators should also provide and update
iterators that allow them to skip non-competive documents.

To enable sort optimization for numeric sort fields,
the following needs to be done:

the field should be indexed with both doc_values and points, that
must have the same field name and same data
SortField#setCanUsePoints must be set
totalHitsThreshold should not be set to max value.

asfimport · 2020-09-08T18:16:46Z

ASF subversion and git services (migrated from JIRA)

Commit 9922067 in lucene-solr's branch refs/heads/master from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9922067

#10489 Skip docs with _doc sort and "after" (#1725)

Enhance DocComparator to provide an iterator over competitive
documents when searching with "after". This iterator can quickly position
on the desired "after" document skipping all documents and segments before
"after".
Redesign numeric comparators to provide skipping functionality
by default.

Relates to LUCENE-9280

asfimport · 2021-12-08T17:05:06Z

Adrien Grand (@jpountz) (migrated from JIRA)

Closing after the 9.0.0 release

asfimport closed this as completed Jul 7, 2020

gashutos mentioned this issue Mar 24, 2023

TimeSeries optimizations in OpenSearch opensearch-project/OpenSearch#3734

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to skip non-competitive documents on field sort [LUCENE-9280] #10320

Add ability to skip non-competitive documents on field sort [LUCENE-9280] #10320

asfimport commented Mar 16, 2020 •

edited

Loading

asfimport commented Mar 16, 2020

asfimport commented Mar 22, 2020

asfimport commented Mar 23, 2020

asfimport commented Mar 31, 2020

asfimport commented Mar 31, 2020

asfimport commented Mar 31, 2020

asfimport commented Apr 20, 2020

asfimport commented Jun 23, 2020

asfimport commented Jul 7, 2020

asfimport commented Jul 7, 2020

asfimport commented Jul 31, 2020 •

edited

Loading

asfimport commented Sep 8, 2020 •

edited

Loading

asfimport commented Dec 8, 2021

Add ability to skip non-competitive documents on field sort [LUCENE-9280] #10320

Add ability to skip non-competitive documents on field sort [LUCENE-9280] #10320

Comments

asfimport commented Mar 16, 2020 • edited Loading

asfimport commented Mar 16, 2020

asfimport commented Mar 22, 2020

asfimport commented Mar 23, 2020

asfimport commented Mar 31, 2020

asfimport commented Mar 31, 2020

asfimport commented Mar 31, 2020

asfimport commented Apr 20, 2020

asfimport commented Jun 23, 2020

asfimport commented Jul 7, 2020

asfimport commented Jul 7, 2020

asfimport commented Jul 31, 2020 • edited Loading

asfimport commented Sep 8, 2020 • edited Loading

asfimport commented Dec 8, 2021

asfimport commented Mar 16, 2020 •

edited

Loading

asfimport commented Jul 31, 2020 •

edited

Loading

asfimport commented Sep 8, 2020 •

edited

Loading