-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip docs with Docvalues in NumericLeafComparator #12405
Conversation
public void test() throws IOException {
final Directory dir = newDirectory();
IndexWriterConfig config =
new IndexWriterConfig()
// Make sure to use the default codec, otherwise some random points formats that have
// large values for maxPointsPerLeaf might not enable skipping with only 10k docs
.setCodec(TestUtil.getDefaultCodec());
final IndexWriter writer = new IndexWriter(dir, config);
final int numDocs = atLeast(10000);
final int missValuesNumDocs = numDocs / 2;
for (int i = 0; i < numDocs; ++i) {
final Document doc = new Document();
if (i <= missValuesNumDocs) { // missing value document
} else {
doc.add(new NumericDocValuesField("my_field", i));
doc.add(new LongPoint("my_field", i));
}
writer.addDocument(doc);
}
final IndexReader reader = DirectoryReader.open(writer);
writer.close();
// single threaded so totalHits is deterministic
IndexSearcher searcher =
newSearcher(reader, random().nextBoolean(), random().nextBoolean(), false);
final int numHits = 3;
final int totalHitsThreshold = 3;
{ // test that optimization is run with NumericDocValues when missing value is NOT competitive
final SortField sortField = new SortField("my_field", SortField.Type.LONG, true);
sortField.setMissingValue(0L); // missing value is not competitive
final Sort sort = new Sort(sortField);
CollectorManager<TopFieldCollector, TopFieldDocs> manager =
TopFieldCollector.createSharedManager(sort, numHits, null, totalHitsThreshold);
TopDocs topDocs = searcher.search(new MatchAllDocsQuery(), manager);
assertEquals(topDocs.scoreDocs.length, numHits);
assertEquals(
topDocs.totalHits.value,
numDocs); // assert that all documents were collected => optimization was not run
}
reader.close();
dir.close();
} This test shows we could not skip document by bkd, but could use |
I'm not clear if this change is still correct when there is another sort field after the one that gets optimized. It seems like it could skip hits that are still needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition is that the way to fix it would be to allow the comparator to know whether it's the only one, and if so it could make isMissingValueCompetitive
return false when the bottom value is equal to the missing value.
Thanks @jpountz. Oh, you are right, Sorry for missing this.
The most simply way is to add a default boolean method like |
We probably need to change |
Thanks for adding the enum. In my view, we now need the two following changes:
|
Thanks for providing excellent advice.
addressed in 4a72bf6 and 94d560c
|
lucene/core/src/java/org/apache/lucene/search/comparators/DoubleComparator.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/IntComparator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a suggestion, but this looks good to me in general. Can you add a CHANGES entry?
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java
Outdated
Show resolved
Hide resolved
56a51f7
to
4b26a0e
Compare
@LuXugang I pushed a commit with the changes I had in mind, what do you think? |
lucene/core/src/test/org/apache/lucene/search/TestSortOptimization.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java
Outdated
Show resolved
Hide resolved
It is nice simplication. Thanks. |
There is still a bug in this RP, test failed with I would fix it later. |
…s.seed=6B2B316B7080952B -Dtests.locale=yav-Latn-CM -Dtests.timezone=Indian/Mahe -Dtests.asserts=true -Dtests.file.encoding=UTF-8
b78decf
to
9af4ba5
Compare
Sorry for the force-push and listt so much commits, this issue addressed in ed75fe7 and 9af4ba5 |
Sorry, since I had approved the PR, I had not understood it was still waiting on me. It's a great change, let's see how to get it in. |
I did my best at fixing conflicts, @LuXugang are you able to check the changes? |
Tests fail because the optimization kicks in in more cases than the test expects, it's not clear to me yet if it's a bug or not. |
Sure thing @jpountz , I would work on this in the next few days. |
You are correct @jpountz , the optimization kicks in and it is not a bug. In that failed test, after the queue is full with the only one comparator(means Pruning.GREATER_THAN_OR_EQUAL_TO) , the bottom value is Long.MAX_VALUE, the missing value which is Long.MAX_VALUE would be non-competitive. then optimization kicks in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
* Skip document by docValues *When the queue is full with only one Comparator, we could better tune the maxValueAsBytes/minValueAsBytes. For instance, if the sort is ascending and bottom value is 5, we will use a range on [MIN_VALUE, 4]. --------- Co-authored-by: Adrien Grand <[email protected]>
Introduced in apache/lucene#12405 We should account for the changes in our overrides and API. Now, to indicate that no skipping can occur, we utilize `Pruning.NONE`.
Updates to support interface change to FieldComparatorSource to use Pruning enum instead of enableSkipping boolean. Signed-off-by: Marc Handalian <[email protected]>
Updates to support interface change to FieldComparatorSource to use Pruning enum instead of enableSkipping boolean. Signed-off-by: Marc Handalian <[email protected]>
Description
Could we implement
TermOrdValLeafComparator
's same logic that using NumericDocValues to skip docs inNumericLeafComparator
if we could not get a iterator by bkd?