Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential bug in IndexedDISI90#SPARSE->advanceExactWithinBlock #12465

Closed
wants to merge 1 commit into from

Conversation

gautamworah96
Copy link
Contributor

@gautamworah96 gautamworah96 commented Jul 26, 2023

While upgrading our Lucene to 9.7 from 9.6 we noticed that a few tests were failing.

On digging deeper into the tests we realized that they were failing on a particular assert. This one:

if (disi.nextExistDocInBlock > targetInBlock) {
          assert !disi.exists;
          return false;
        }

This changes the behavior of this function as compared to the behavior in Lucene 9.6 (and before). It could be possible that we intentionally decided to change the behavior of this function, but to my eye, it feels like that was not the case?

I attached a test that shows the assert failing. The test tries to call advanceExact which underneath it calls advanceExactWithinBlock on a document that is behind the current doc position. In the previous version of Lucene this would return a false and would exit. But with the upgrade, it fails mysteriously on an assert.

I created this PR because IIUC the assert checks for something that is completely unrelated to the current method call? Why would we want to assert on the existence of a previous document when calling for advanceExactWithinBlock on a new document.. My preference here would be to remove just the assert statement and to revert to the previous behavior (of returning false when calling for a document that is behind the current cursor position).

I could also be wrong though. Would appreciate a second look from some folks in the community.. Thanks!

Related PR that added the assert: #12324

@gf2121
Copy link
Contributor

gf2121 commented Jul 27, 2023

Thanks @gautamworah96 for reporting this !

I was adding the assert to guarantee the consistency between disi.doc and disi.exists. It could be surprising if IndexedDISI#advanceExactWithinBlock return false while disi.exists = true.

DocValuesIterator#advanceExact, which is usually the interface exposed to normal users, says that: "target must be greater than or equal to the current doc ID". But the test you add was calling advanceExact(i-1) after advanceExact(i). So this behavior is undefined under the API contract. I'm not very sure if we need to maintain the undefined behavior to keep consistent with before ?

BTW, IndexedDISI did not extend DocValuesIterator. So it will make sense to me to add similar java doc on IndexedDISI#advanceExact if it is causing confusion.

@gautamworah96
Copy link
Contributor Author

I was adding the assert to guarantee the consistency between disi.doc and disi.exists.

Hmm. To me it feels like we are changing the current behavior of the (admittedly undefined) API for very little benefit. I agree that we should not have IndexedDISI#advanceExactWithinBlock return false while disi.exists = true but maybe we could handle this in a different way?

For example, what if we assigned disi.exists to false here?

Changing it this way would fix the behavior (of the inconsistency between disi.advance returning false and disi.exists being true) and would also not break the behavior of the API?

What is happening with this assert is that clients who relied on earlier calls that requested backwards docids were earlier getting false as the answer and were working on that assumption. But with this assert, tests fail in a somewhat obstruse way..

I'm not very sure if we need to maintain the undefined behavior to keep consistent with before ?

IMO, it would be better to try to maintain the behavior of the API unless we see a strong compelling benefit (like much better code structure or user experience benefit) in changing it?

So it will make sense to me to add similar java doc on IndexedDISI#advanceExact if it is causing confusion.

I definitely want to add some javadoc here. It took me quite some time to understand what the API did.. :)

@gf2121
Copy link
Contributor

gf2121 commented Jul 28, 2023

For example, what if we assigned disi.exists to false here?

Sorry but I do not think we should do this change indeed :

  1. If a field exists in doc 1, 3. Calling advanceExact(1) after calling advanceExact(2) will return false. This is incorrect.

  2. An assert error should help caller side find out bugs that calling API in an unexpected way, which is better than handle it silently.

@gautamworah96 gautamworah96 marked this pull request as draft July 29, 2023 19:52
@gautamworah96
Copy link
Contributor Author

An assert error should help caller side find out bugs that calling API in an unexpected way, which is better than handle it silently.

True. Let me think about this a little bit more. I've changed the PR to be a draft for now.

@gautamworah96
Copy link
Contributor Author

Okay. I don't think there is much to do here. The check we have introduced here is indeed helpful.. even though the check right now is just at test level (because it is an assert) it has already helped us in catching cases where we were accessing DISIs in an invalid manner..

I will open a separate PR for the documentation fixes I had in mind.

Thanks @gf2121 !

@gautamworah96
Copy link
Contributor Author

Just to add some more context here, we were in fact doing something wrong with our IndexedDISI iterators.
At Amazon Search, we have some custom caching logic that shares IndexedDISI iterators for custom fields that all rely on the same underlying fields.

What ended up happening was that, in most cases, all fields would always advance to the same doc, but in certain rare cases, we ended up in situations where one field (that shared the iterator) was trying to start from scratch, ie, 0 when other fields had already advanced the iterators to a forward position.

Up until Lucene 9.7, we never hit an error because the code was lenient and always returned false if the iterator was trying to access a position that was backwards. However, with PR #12324 this logic was changed and that broke some of our tests. We have now found the bug and have fixed it! Yay!

@mikemccand
Copy link
Member

It's awesome that Lucene changed to be more picky (removing leniency) on catching invalid usage of the API! Such improvements are great at ferreting out bugs (instead of masking them) in the application layer (Amazon Search in this case) using Lucene!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants