-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout support to AbstractKnnVectorQuery #13202
Conversation
BenchmarksUsed baseline with
|
Is this because we have already exhausted our query timeout during rewrite, and so we throw a |
Exactly! This issue is also visible from the above benchmarks, where the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting find on KNN search behavior, @kaivalnp! I added some thoughts after a first pass at the PR.
Apart from those, one divergent behavior is that we won't be raising some form of TimeExceededException
when we timeout and terminate the search. Wonder if we should have a way to surface that this was a timeout, v/s all the results we could collect within visitLimit()
node traversal.
if (queryTimeout.shouldExit()) { | ||
// Quickly exit | ||
return TopDocsCollector.EMPTY_TOPDOCS; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most KnnCollector implementations (like this one) tend to return whatever results we've collected so far, and set the relation as TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO
if we early terminated. Any reason why we don't want to keep the same behavior here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return whatever docs we were able to collect within the time budget. We have already done the heavy work of approximate search for those hits, what's left is only to return those cached results from the DocAndScoreQuery
.
It's not the behavior today, as TimeLimitingBulkScorer
sees the query as timed out, and doesn't even run the first interval
on DocAndScoreQuery. I need to think more of a good way around it. I'm thinking that if scorer is doing the heavy work of iterating through ImpactsEnum or PostingsEnum, then it makes sense to early terminate during scoring. But for queries with cached results like DocAndScoreQuery
, maybe we shouldn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning partial results makes sense, but I passed the empty results here purely because they would be discarded later (so we could save on computations used for creating the TopDocs
)
TimeLimitingBulkScorer
sees the query as timed out, and doesn't even run the firstinterval
But for queries with cached results likeDocAndScoreQuery
, maybe we shouldn't?
I don't think returning partial results would add value today, can we add a TODO
to return them once the above behavior is fixed?
Most KnnCollector implementations (like this one) tend to return whatever results we've collected so far, and set the relation as
TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO
if we early terminated
On a separate note, this seems to be a no-op as well -- since we return fresh results from exact search if approximate results are incomplete. I wonder if we can return empty results here as well to save on some computations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think returning partial results would add value today, can we add a
TODO
to return them once the above behavior is fixed?
Yes, we can handle it in a different change. I'll start a discussion on a separate issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a separate note, this seems to be a no-op as well -- since we return fresh results from exact search if approximate results are incomplete.
I think we only fall back to exact search if there is a filter provided, i.e. filterWeight != null
(code). Otherwise we return the approx search result and callers can use TotalHits.Relation
to figure out if search terminated early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think topDocs()
from the collector should return whatever it collected, and set the relation
to GTE. With EMPTY_TOPDOCS
, the query gets rewritten as MatchNoDocsQuery
which is misleads debugging and further masks the timeout occurrence. We can let the results get discarded by TimeLimitingBulkScorer
for now, while we handle that in a separate change (since it is at par with existing behavior).
I'm open to more opinions on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
callers can use
TotalHits.Relation
to figure out if search terminated early.
This information is internal to AbstractKnnVectorQuery
-- even if we pass some TotalHits.Relation
, the rewritten DocAndScoreQuery
treats all results as "complete" (no distinction for the relation here, and does not pass it on to IndexSearcher#search
)
Though I agree, we could pass those results right now and handle them properly in a follow up issue. I've added a commit to do this
return new KnnCollector() { | ||
@Override | ||
public boolean earlyTerminated() { | ||
return queryTimeout.shouldExit() || collector.earlyTerminated(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so both searchLevel()
and findBestEntryPoint()
in HnswGraphSearcher
check earlyTerminated()
on their collector to abort the search, so configuring the timeout check allows us to preempt the approximate search.
Could we add headers for each column with these benchmark results please? Thanks! |
Separately, should we deprecate |
Thanks for the review @vigyasharma!
I think this exception will be raised here automatically, because the timeout has already occurred? Though we won't know / raise this explicitly because graph search timed out, but the end goal of marking results as "incomplete" will still be satisfied? I'm also not sure how we'd do it explicitly, perhaps we require lower-level API changes to allow rewrites to time out more gracefully?
Oops, I missed adding the benchmark description and row headers earlier, updated now..
+1, I only see occurrences of it in comments. I'll try to create a PR for this soon |
Created #13220 to discuss this |
// Mark results as partial if timeout is met | ||
TotalHits.Relation relation = | ||
queryTimeout.shouldExit() | ||
? TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO | ||
: docs.totalHits.relation; | ||
|
||
return new TopDocs(new TotalHits(docs.totalHits.value, relation), docs.scoreDocs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simply return collector.topDocs();
and let collectors decide how to handle this? All implementations of AbstractKnnCollector
already set relation to GTE
based on earlyTerminated()
check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the collector.topDocs()
will set the relation to GTE
in case of timeouts (it will only set this when the visitLimit()
is crossed, because it will use its own internal earlyTerminated()
function that does not have information about the QueryTimeout
)
@@ -102,14 +103,34 @@ public void testVectorEncodingMismatch() throws IOException { | |||
} | |||
} | |||
|
|||
public void testTimeout() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also add a test for the partial result case. You could create a mock query timeout that returns false
only for the first time it's called, and then flips to returning true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense!
There was a consideration here that the number of levels in the HNSW graph should be == 1, because if the timeout is hit while finding the best entry point for the last level, we haven't collected any results yet. I think this should be fine as we're only indexing 3 vectors, and running these tests for a few thousand times did not give an error. Added a note about this as well
Also fixed another place where the timeout needs to be checked
- Refactor tests to base classes - Also timeout exact searches in Lucene99HnswVectorsReader
while (vectorScorer.nextParent() != DocIdSetIterator.NO_MORE_DOCS) { | ||
// Mark results as partial if timeout is met | ||
if (queryTimeout != null && queryTimeout.shouldExit()) { | ||
relation = TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO; | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wanted some opinions here: we're checking the timeout in exact search of DiversifyingChildren[Byte|Float]KnnVectorQuery
once per-parent (as opposed to once per document elsewhere)
Should we update this to once per-child as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeouts are inevitably approximate, so I guess it should be fine. Also seems like something that we can easily change in a follow up PR is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks!
Looking at the benchmarking, we are adding a 5% overhead to all vector operations when using float32. As vector operations get faster (consider hamming distance with exploring more vectors), I expect this timeout mechanism to have an even bigger impact on search. What was the dimension count & dataset for your performance testing @kaivalnp ? Do we really think that timing out a couple of seconds faster is worth the overhead? |
It was the enwiki dataset (https://home.apache.org/~sokolov/enwiki-20120502-lines-1k-100d.vec), unit vectors, 100 dimensions, indexed first 1M vectors, searched next 10K vectors, with
This overhead is only when a timeout is set but not met, there is no regression when it is not set at all (in the benchmarked case without any filter). IMO this additional check can be useful to time out unexpectedly expensive queries in a production system to keep overall CPU within limits, but looking forward to more opinions here.. |
@kaivalnp AH, ok, only 100 dimensions, this explains the overly large impact that checking for timeout had, as there is very little work being done on distance calculations. I would expect the percentage of compute to drop as the amount of work done by vector comparisons to increase. I have a dumb question about timeouts. I am guessing the Timeouts are already approximate, I understand the need for having a more strict timeout, but I am not sure we should check for timeout on every single vector. |
I have used the default
Perhaps this can be configured by the end-user themselves, by making actual timeout checks after every TK number of calls, according to acceptable latency / accuracy tradeoffs? For example, I modified the return timeoutAt != null && nanoTime() - timeoutAt > 0; to // count is an integer initialized to 0
return timeoutAt != null && ++count % 100 == 0 && nanoTime() - timeoutAt > 0; to check the timeout every 100 vectors, and the latency overhead with a |
That being said, I'm okay if we do it in a separate change. |
I think this PR's changes are okay to be merged. @kaivalnp: can you please resolve the version conflicts and add a changes entry, and I can merge it in. |
Ahh nice catch! You mean something like: // counter is an integer initialized to 0
return (counter++ == 100 && queryTimeout.shouldExit()) || collector.earlyTerminated(); in the This makes sense to me, we can add this in a follow-up and even make the interval growing like this commit? |
Yes. Let's do it in a follow-up change. |
I see a couple of recent failures related to the test added here:
Sample command to reproduce as mentioned in (1): ./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestKnnByteVectorQuery.testTimeout" -Ptests.jvms=2 -Ptests.jvmargs= -Ptests.seed=7F16007D6F6A386B -Ptests.gui=true -Ptests.file.encoding=ISO-8859-1 -Ptests.vectorsize=256 However when I run it on my machine, I'm not getting the same error. Could someone double-check if this is deterministically reproducible? Additionally, should we mark the test as |
* Add timeout support to graph searches in AbstractKnnVectorQuery * Also timeout exact searches * Return partial KNN results * Add tests for partial KNN results - Refactor tests to base classes - Also timeout exact searches in Lucene99HnswVectorsReader * Add CHANGES.txt entry and fix some comments --------- Co-authored-by: Kaival Parikh <[email protected]>
FYI I opened #13285 to add the same timeout support to |
* Add timeout support to graph searches in AbstractKnnVectorQuery * Also timeout exact searches * Return partial KNN results * Add tests for partial KNN results - Refactor tests to base classes - Also timeout exact searches in Lucene99HnswVectorsReader * Add CHANGES.txt entry and fix some comments --------- Co-authored-by: Kaival Parikh <[email protected]>
Backported to |
Description
#927 added timeout support to
IndexSearcher#search
where we wrap a scorer in aTimeLimitingBulkScorer
which periodically checks whether a query timed out while collecting docs in fixed chunksHowever as #11677 points out, this does not apply to query rewrites -- an important use case being vector searches (
AbstractKnnVectorQuery
), where the bulk of computations (HNSW graph searches) happen in#rewrite
If we had set a timeout, and HNSW searches turned out too expensive -- we would not return results even after the search completes (because the timeout is checked before attempting to score docs -- at which point we have all possible results, but they aren't returned)
In this PR, we're wrapping the
KnnCollector
in another one which additionally checks forQueryTimeout#shouldExit
in theKnnCollector#earlyTerminated
function. If this timeout is hit, we immediately return empty resultsAlso extended this to exact searches, where we check for a timeout before scoring each document