Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip non-competitive documents when sort by _doc with search after [LUCENE-9449] #10489

Closed
asfimport opened this issue Aug 7, 2020 · 11 comments

Comments

@asfimport
Copy link

asfimport commented Aug 7, 2020

Enhance DocComparator to provide an iterator over competitive documents when search ing with "after" FieldDoc.

This iterator can quickly position on the desired "after" document, and skip all documents or even segments that contain documents before "after"

This is especially efficient when "after" is high.

 

Related to #10320


Migrated from LUCENE-9449 by Mayya Sharipova (@mayya-sharipova), resolved Sep 10 2020
Pull requests: apache/lucene-solr#1725

@asfimport
Copy link
Author

asfimport commented Sep 8, 2020

ASF subversion and git services (migrated from JIRA)

Commit 9922067 in lucene-solr's branch refs/heads/master from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9922067

LUCENE-9449 Skip docs with _doc sort and "after" (#1725)

  • Enhance DocComparator to provide an iterator over competitive
    documents when searching with "after". This iterator can quickly position
    on the desired "after" document skipping all documents and segments before
    "after".

  • Redesign numeric comparators to provide skipping functionality
    by default.

Relates to #10320

@asfimport
Copy link
Author

Tomas Eduardo Fernandez Lobbe (@tflobbe) (migrated from JIRA)

I didn't look at this change in detail, but there are some failures in Jenkins that nay be related to this commit:

Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/27938/
Java: 64bit/jdk-11.0.6 -XX:+UseCompressedOops -XX:+UseParallelGC

1 tests failed.
FAILED:  org.apache.lucene.search.TestIndexSortSortedNumericDocValuesRangeQuery.testSameHitsAsPointRangeQuery

Error Message:
java.lang.AssertionError

Stack Trace:
java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([1BAE2B7423173E7C:20AD3790C6015356]:0)
        at org.apache.lucene.search.ConjunctionDISI.doNext(ConjunctionDISI.java:195)
        at org.apache.lucene.search.ConjunctionDISI.nextDoc(ConjunctionDISI.java:240)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:254)
        at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:211)
        at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
        at org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:71)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:741)
        at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:72)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:528)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:657)
        at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:638)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:559)
        at org.apache.lucene.search.TestIndexSortSortedNumericDocValuesRangeQuery.assertSameHits(TestIndexSortSortedNumericDocValuesRangeQuery.java:87)
        at org.apache.lucene.search.TestIndexSortSortedNumericDocValuesRangeQuery.testSameHitsAsPointRangeQuery(TestIndexSortSortedNumericDocValuesRangeQuery.java:76)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
        at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
        at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
        at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
        at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
        at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
        at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
        at java.base/java.lang.Thread.run(Thread.java:834)

@asfimport
Copy link
Author

Mayya Sharipova (@mayya-sharipova) (migrated from JIRA)

@tflobbe

Thanks for informing me about this, and sorry for the trouble.

I am working on fixing these test failures.

@asfimport
Copy link
Author

Mayya Sharipova (@mayya-sharipova) (migrated from JIRA)

The test failure would be fixed by this PR: apache/lucene-solr#1833

once it is merged.

@asfimport
Copy link
Author

Mayya Sharipova (@mayya-sharipova) (migrated from JIRA)

apache/lucene-solr#1833 has been merged, and the test doesn't fail anymore

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 7542168 in lucene-solr's branch refs/heads/branch_8x from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7542168

LUCENE-9449 Skip docs with _doc sort and "after" (#1725) (#1856)

  • Enhance DocComparator to provide an iterator over competitive
    documents when searching with "after". This iterator can quickly position
    on the desired "after" document skipping all documents and segments before
    "after".

  • Redesign numeric comparators to move to separate package.

Backport for #LUCENE-9449

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 7542168 in lucene-solr's branch refs/heads/branch_8x from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7542168

LUCENE-9449 Skip docs with _doc sort and "after" (#1725) (#1856)

  • Enhance DocComparator to provide an iterator over competitive
    documents when searching with "after". This iterator can quickly position
    on the desired "after" document skipping all documents and segments before
    "after".

  • Redesign numeric comparators to move to separate package.

Backport for #LUCENE-9449

@asfimport
Copy link
Author

asfimport commented Jul 6, 2021

ASF subversion and git services (migrated from JIRA)

Commit 64d9f8c in lucene's branch refs/heads/main from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene.git;h=64d9f8c

#11059 DocComparator don't skip docs of same docID (#204)

DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
[_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Relates to LUCENE-9449

@asfimport
Copy link
Author

asfimport commented Jul 6, 2021

ASF subversion and git services (migrated from JIRA)

Commit 64d9f8c in lucene's branch refs/heads/main from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene.git;h=64d9f8c

#11059 DocComparator don't skip docs of same docID (#204)

DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
[_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Relates to LUCENE-9449

@asfimport
Copy link
Author

asfimport commented Jul 7, 2021

ASF subversion and git services (migrated from JIRA)

Commit bdef1be in lucene-solr's branch refs/heads/branch_8x from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bdef1be

#11059 DocComparator don't skip docs of same docID (#2530)

DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
[_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Backport for #204
Relates to LUCENE-9449

@asfimport
Copy link
Author

asfimport commented Jul 7, 2021

ASF subversion and git services (migrated from JIRA)

Commit bdef1be in lucene-solr's branch refs/heads/branch_8x from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bdef1be

#11059 DocComparator don't skip docs of same docID (#2530)

DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
[_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Backport for #204
Relates to LUCENE-9449

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant