-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure sub-iterators of ConjunctionDISI are on the same document [LUCENE-9541] #10581
Comments
Adrien Grand (@jpountz) (migrated from JIRA) In my opinion this is a bug in how ConjunctionDISI was used, not a bug in BitSetConjunctionDISI. Conjunction iterators maintain the invariant that between two calls to nextDoc or advance, all sub iterators are on the same doc ID. If we advance one of the subs without making ConjunctionDISI aware of it, then we break this invariant. We found this with BitSetConjunctionDISI but this would also be a problem with ConjunctionDISI. To take your example of a and b both matching [0,1], if you create a ConjunctionDISI over both iterators and a is picked as the lead, then advance b to 1 and finally call nextDoc() on the ConjunctionDISI, then it will first call nextDoc() on a, which returns 0, and then advance(0) on b which is illegal since it's illegal to call advance with a target that is less than or equal to the current doc ID. |
Mayya Sharipova (@mayya-sharipova) (migrated from JIRA) Caught with @jpountz offline, and we have decided to add checks that assert that all sub-iterators are on the same document all the time. |
ASF subversion and git services (migrated from JIRA) Commit 5f34acf in lucene-solr's branch refs/heads/master from Mayya Sharipova LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)
Ensure sub-iterators of a conjunction iterator are on the same doc. |
ASF subversion and git services (migrated from JIRA) Commit 5f34acf in lucene-solr's branch refs/heads/master from Mayya Sharipova LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)
Ensure sub-iterators of a conjunction iterator are on the same doc. |
ASF subversion and git services (migrated from JIRA) Commit e325f66 in lucene-solr's branch refs/heads/master from Mayya Sharipova Revert "LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)" This reverts commit 5f34acf. |
ASF subversion and git services (migrated from JIRA) Commit 6b82884 in lucene-solr's branch refs/heads/master from Mayya Sharipova LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)
Ensure sub-iterators of a conjunction iterator are on the same doc. |
ASF subversion and git services (migrated from JIRA) Commit 6b82884 in lucene-solr's branch refs/heads/master from Mayya Sharipova LUCENE-9541 ConjunctionDISI sub-iterators check (#1937)
Ensure sub-iterators of a conjunction iterator are on the same doc. |
ASF subversion and git services (migrated from JIRA) Commit 66c49a3 in lucene-solr's branch refs/heads/branch_8x from Mayya Sharipova LUCENE-9541 ConjunctionDISI sub-iterators check (#1937) Ensure sub-iterators of a conjunction iterator are on the same doc. |
This has been merged. |
Not completely sure if this is a bug.
BitSetConjuctionDISI advances based on its lead – DocIdSetIterator iterator, and doesn't consider that its another component – BitSetIterator may have already advanced passed a certain doc. This may result in duplicate documents.
For example if BitSetConjuctionDISI disi is composed of DocIdSetIterator a of docs [0,1] and BitSetIterator b of docs [0,1]. Doing
b.nextDoc()
we are collecting doc0, doingdisi.nextDoc
we again collecting the same doc0.It seems that other conjunction iterators don't have this behaviour, if we are advancing any of their component pass a certain document, the whole conjunction iterator will also be advanced pass this document.
This behaviour was exposed in this PR.
Migrated from LUCENE-9541 by Mayya Sharipova (@mayya-sharipova), updated Oct 06 2020
Pull requests: apache/lucene-solr#1937
The text was updated successfully, but these errors were encountered: