From c7fe6ef84a3dc32c4feabb27dcada28668871db5 Mon Sep 17 00:00:00 2001 From: Benjamin Trent Date: Tue, 25 Jun 2024 09:06:17 -0400 Subject: [PATCH] Add some docs explaining filter performance and behavior for HNSW (#110108) (#110139) --- .../search-your-data/knn-search.asciidoc | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs/reference/search/search-your-data/knn-search.asciidoc b/docs/reference/search/search-your-data/knn-search.asciidoc index 76318de94bafd..8f8ef8ed33313 100644 --- a/docs/reference/search/search-your-data/knn-search.asciidoc +++ b/docs/reference/search/search-your-data/knn-search.asciidoc @@ -284,6 +284,24 @@ post-filtering approach, where the filter is applied **after** the approximate kNN search completes. Post-filtering has the downside that it sometimes returns fewer than k results, even when there are enough matching documents. +[discrete] +[[approximate-knn-search-and-filtering]] +==== Approximate kNN search and filtering + +Unlike conventional query filtering, where more restrictive filters typically lead to faster queries, +applying filters in an approximate kNN search with an HNSW index can decrease performance. +This is because searching the HNSW graph requires additional exploration to obtain the `num_candidates` +that meet the filter criteria. + +To avoid significant performance drawbacks, Lucene implements the following strategies per segment: + +* If the filtered document count is less than or equal to num_candidates, the search bypasses the HNSW graph and +uses a brute force search on the filtered documents. + +* While exploring the HNSW graph, if the number of nodes explored exceeds the number of documents that satisfy the filter, +the search will stop exploring the graph and switch to a brute force search over the filtered documents. + + [discrete] ==== Combine approximate kNN with other features