Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure sub-iterators of ConjunctionDISI are on the same document [LUCENE-9541] #10581

Closed
asfimport opened this issue Sep 23, 2020 · 9 comments
Closed

Comments

@asfimport
Copy link

Not completely sure if this is a bug.

BitSetConjuctionDISI advances based on its lead  – DocIdSetIterator iterator, and doesn't consider that its another component – BitSetIterator may have already advanced passed a certain doc. This may result in duplicate documents.

For example if BitSetConjuctionDISI  disi is composed of DocIdSetIterator a of docs  [0,1] and BitSetIterator b of docs [0,1].  Doing b.nextDoc() we are collecting doc0,  doing disi.nextDoc we again  collecting the same doc0.

It seems that other conjunction iterators don't have this behaviour, if we are advancing any of their component pass a certain document, the whole conjunction iterator will also be advanced pass this document. 

 

This behaviour was exposed in this PR.


Migrated from LUCENE-9541 by Mayya Sharipova (@mayya-sharipova), updated Oct 06 2020
Pull requests: apache/lucene-solr#1937

@asfimport
Copy link
Author

Adrien Grand (@jpountz) (migrated from JIRA)

In my opinion this is a bug in how ConjunctionDISI was used, not a bug in BitSetConjunctionDISI.

Conjunction iterators maintain the invariant that between two calls to nextDoc or advance, all sub iterators are on the same doc ID. If we advance one of the subs without making ConjunctionDISI aware of it, then we break this invariant. We found this with BitSetConjunctionDISI but this would also be a problem with ConjunctionDISI.

To take your example of a and b both matching [0,1], if you create a ConjunctionDISI over both iterators and a is picked as the lead, then advance b to 1 and finally call nextDoc() on the ConjunctionDISI, then it will first call nextDoc() on a, which returns 0, and then advance(0) on b which is illegal since it's illegal to call advance with a target that is less than or equal to the current doc ID.

@asfimport
Copy link
Author

Mayya Sharipova (@mayya-sharipova) (migrated from JIRA)

Caught with @jpountz offline, and we have decided to add checks that assert that all sub-iterators are on the same document all the time. 

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 5f34acf in lucene-solr's branch refs/heads/master from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5f34acf

LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)

Ensure sub-iterators of a conjunction iterator are on the same doc.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 5f34acf in lucene-solr's branch refs/heads/master from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5f34acf

LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)

Ensure sub-iterators of a conjunction iterator are on the same doc.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit e325f66 in lucene-solr's branch refs/heads/master from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e325f66

Revert "LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)"

This reverts commit 5f34acf.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 6b82884 in lucene-solr's branch refs/heads/master from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6b82884

LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)

Ensure sub-iterators of a conjunction iterator are on the same doc.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 6b82884 in lucene-solr's branch refs/heads/master from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6b82884

LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)

Ensure sub-iterators of a conjunction iterator are on the same doc.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 66c49a3 in lucene-solr's branch refs/heads/branch_8x from Mayya Sharipova
https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=66c49a3

LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)

Ensure sub-iterators of a conjunction iterator are on the same doc.

@jpountz
Copy link
Contributor

jpountz commented Nov 15, 2022

This has been merged.

@jpountz jpountz closed this as completed Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants