Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failing BaseKnnVectorQueryTestCase#testTimeout #13283

Merged
merged 2 commits into from
Apr 11, 2024

Conversation

kaivalnp
Copy link
Contributor

@kaivalnp kaivalnp commented Apr 8, 2024

Description

Fixes #13272

The failure happens because of this assumption of the HNSW graph having just 1 level, but we get more than 1 for some seed values -- so the counting timeout of 1 node is exhausted while trying to find the best entry point for the last level, and we consequently get no results, failing this assertion

The assertion wants to check that when an approximate search is performed (some docs are fed to the collector), we return partial results even after a timeout -- and we can check this explicitly in a separate test for TimeLimitingKnnCollectorManager

With that test in place, and since we cannot easily control the number of levels in the underlying graph, can we relax the test to check for "at most 1 result" instead of "exactly 1 result"?

@kaivalnp
Copy link
Contributor Author

kaivalnp commented Apr 8, 2024

Side note: the original command to reproduce from #13202 (comment):

gradlew :lucene:core:test --tests "org.apache.lucene.search.TestKnnByteVectorQuery.testTimeout" -Ptests.jvms=2 -Ptests.jvmargs= -Ptests.seed=7F16007D6F6A386B -Ptests.gui=true -Ptests.file.encoding=ISO-8859-1 -Ptests.vectorsize=256

..does not work until we remove the --tests parameter like:

gradlew :lucene:core:test -Ptests.jvms=2 -Ptests.jvmargs= -Ptests.seed=7F16007D6F6A386B -Ptests.gui=true -Ptests.file.encoding=ISO-8859-1 -Ptests.vectorsize=256

Moreover, it deterministically fails with a minimal command of (stripping extra params):

gradlew :lucene:core:test -Ptests.jvms=2 -Ptests.seed=7F16007D6F6A386B

..which does not reproduce without tests.jvms, I'm not entirely sure why..

I tried running the minimal command above using all 3 seed values from the linked comment, and the test fails deterministically before this PR and passes afterwards

Copy link
Contributor

@vigyasharma vigyasharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find, I agree with the fix. The current test was fragile. CountingQueryTimeout can only count the no. of times queryTimeout.shouldExit() was invoked. The test fails if it gets called before a collect() call, which is happening here when we find the best entry point.

Since QueryTimeout doesn't have access to no. of results collected, I don't see of a reliable way of timing out only after x results are picked. We can only provide an upper bound on the assert, not a lower bound.

@vigyasharma
Copy link
Contributor

Let us also unmute the broken tests as part of this change?

@kaivalnp
Copy link
Contributor Author

kaivalnp commented Apr 9, 2024

Ah thanks! I fixed and unmuted one of those earlier, but missed the other -- addressed both now..

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good :)

assertEquals(1, searcher.count(query)); // Expect only 1 result
// Note: We get partial results when the HNSW graph has 1 layer, but no results for > 1 layer
// because the timeout is exhausted while finding the best entry node for the last level
assertTrue(searcher.count(query) <= 1); // Expect at most 1 result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@vigyasharma vigyasharma merged commit 13cf882 into apache:main Apr 11, 2024
3 checks passed
vigyasharma pushed a commit that referenced this pull request Apr 11, 2024
* Fix failing BaseKnnVectorQueryTestCase#testTimeout

* Also fix ParentBlockJoinKnnVectorQueryTestCase#testTimeout

---------

Co-authored-by: Kaival Parikh <[email protected]>
vigyasharma pushed a commit that referenced this pull request Apr 17, 2024
* Fix failing BaseKnnVectorQueryTestCase#testTimeout

* Also fix ParentBlockJoinKnnVectorQueryTestCase#testTimeout

---------

Co-authored-by: Kaival Parikh <[email protected]>
@vigyasharma
Copy link
Contributor

Backported to branch_9x

@kaivalnp kaivalnp deleted the GH-13272 branch April 18, 2024 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestKnnByteVectorQuery.testTimeout fails
3 participants