Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9555: Advance conjuction Iterator for two phase iteration #1943

Merged
merged 3 commits into from
Oct 6, 2020

Conversation

mayya-sharipova
Copy link
Contributor

@mayya-sharipova mayya-sharipova commented Oct 2, 2020

PR #1351 introduced a sort optimization where
documents can be skipped.
But there was a bug in case we were using two phase
approximation, as we would advance it without advancing
an overall conjunction iterator.

This patch fixed it.

Relates to #1351

Some collectors provide iterators that can efficiently skip
non-competitive docs. When using DefaultBulkScorer#score function
we create a conjunction of scorerIterator and collectorIterator.
As collectorIterator always starts from a docID = -1,
and for creation of conjunction iterator we need all of its
sub-iterators to be on the same doc, the creation of conjuction
iterator will fail if scorerIterator has already been advanced
to some other document.

This patch ensures that we create conjunction between scorerIterator
and collectorIterator only if scorerIterator has not been advanced yet.

Relates to apache#1725
Relates to apache#1937
@mayya-sharipova
Copy link
Contributor Author

This patch also addresses the failure of TestUnifiedHighlighterStrictPhrases.testBasics.
This test was using ReqExclBulkScorer that lead to scorerIterator being already on max document when we were trying to make a conjunction between scorerIterator and collectorIterator.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a pity to disable this great optimization when scoring ranges of doc IDs versus scoring the entire doc ID space at once. Maybe we can apply the leap-frog logic without using ConjunctionDISI in that method, or alternatively wrap the scorer DISI in a way that makes usage of ConjunctionDISI legal?

class RangeDISIWrapper {

  private final DocIdSetIterator in;
  private final int min, max;
  private int doc = -1;

  @Override
  public int advance(int target) {
    target = Math.max(min, target);
    if (target >= max) {
      return NO_MORE_DOCS;
    }
    return doc = in.advance(target);
  }

}

@mayya-sharipova mayya-sharipova changed the title LUCENE-9555 Ensure scorerIterator is fresh for opt LUCENE-9555 Advance conjuction Iterator for two phase iteration Oct 5, 2020
@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Oct 5, 2020

@jpountz Sorry for the noise, I have found the cause of this error, and the latest commit addresses it.
Basically this PR will just address the failing test of TestUnifiedHighlighterStrictPhrases.testBasics.

My next steps will the following:
Plan A:

  • use leap-frog logic without using ConjunctionDISI for this method (a separate PR for that)
  • Then reintroduce again checks in ConjunctionDISI that I reverted

Plan B:

  • continue to use ConjunctionDISI to combine scorerIterator and collectorIterator
  • from checks in ConjunctionDISI keep only the check of subiterators the same doc during constructor and remove all other checks, as sort optimization by introducing new DISI (from -1 doc) in the middle of iteration would break these checks.

I am interested in your opinion which plan is better?

@jpountz jpountz changed the title LUCENE-9555 Advance conjuction Iterator for two phase iteration LUCENE-9555: Advance conjuction Iterator for two phase iteration Oct 6, 2020
@mayya-sharipova mayya-sharipova merged commit 6ac94a6 into apache:master Oct 6, 2020
@mayya-sharipova mayya-sharipova deleted the filtered-iterator branch October 6, 2020 13:22
mayya-sharipova added a commit that referenced this pull request Oct 6, 2020
)

PR #1351 introduced a sort optimization where
documents can be skipped.
But there was a bug in case we were using two phase
approximation, as we would advance it without advancing
an overall conjunction iterator.

This patch fixed it.

Relates to #1351
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants