Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Documentation for dynamic query parameters for kNN search request #7761

Merged
merged 12 commits into from
Jul 22, 2024
51 changes: 49 additions & 2 deletions _search-plugins/knn/approximate-knn.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@
10 | 1 | 1 | 4 | 4 | 1
10 | 10 | 1 | 4 | 10 | 10
10 | 1 | 2 | 4 | 8 | 2

The number of results returned by Faiss/NMSLIB differs from the number of results returned by Lucene only when `k` is smaller than `size`. If `k` and `size` are equal, all engines return the same number of results.

Starting in OpenSearch 2.14, you can use `k`, `min_score`, or `max_distance` for [radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/).
Expand Down Expand Up @@ -253,7 +253,54 @@
...
```

After data is ingested, it can be search just like any other `knn_vector` field!
After data is ingested, it can be searched just like any other `knn_vector` field.

Check warning on line 256 in _search-plugins/knn/approximate-knn.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Simple] Don't use 'just' because it's not neutral in tone. If you mean 'only', use 'only' instead. Raw Output: {"message": "[OpenSearch.Simple] Don't use 'just' because it's not neutral in tone. If you mean 'only', use 'only' instead.", "location": {"path": "_search-plugins/knn/approximate-knn.md", "range": {"start": {"line": 256, "column": 44}}}, "severity": "WARNING"}
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

### Additional query parameters
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

Starting with version 2.16, you can provide `method_parameters` in a search request:

```json
GET my-knn-index-1/_search
{
"size": 2,
"query": {
"knn": {
"my_vector2": {
"vector": [2, 3, 5, 6],
"k": 2,
"method_parameters" : {
"ef_search": 100
}
}
}
}
}
```
shatejas marked this conversation as resolved.
Show resolved Hide resolved
These parameters are dependent on the combination of engine and method used to create the index. The following sections provide information about the supported `method_parameters`.

#### HNSW
shatejas marked this conversation as resolved.
Show resolved Hide resolved

kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

You can provide the `ef_search` parameter when searching an index created using the `hnsw` method. The `ef_search` parameter specifies to explore the `ef_search` number of vectors to find the top k nearest neighbors. Higher value of `ef_search` improves recall at the cost of increased search latency. The value must be positive.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

The following table provides information about the `ef_search` parameter for the supported engines.

Engine | Radial query support | Notes
:--- | :--- | :---
`nmslib` | No | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting.
`faiss` | Yes | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting.

Check failure on line 291 in _search-plugins/knn/approximate-knn.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: faiss. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: faiss. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/knn/approximate-knn.md", "range": {"start": {"line": 291, "column": 2}}}, "severity": "ERROR"}

Check failure on line 291 in _search-plugins/knn/approximate-knn.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Faiss' instead of 'faiss'. Raw Output: {"message": "[Vale.Terms] Use 'Faiss' instead of 'faiss'.", "location": {"path": "_search-plugins/knn/approximate-knn.md", "range": {"start": {"line": 291, "column": 2}}}, "severity": "ERROR"}
Lucene | no | Engine supports `k` or `ef_search`. The final result can be controlled by `size`. k-NN plugin will pick `max(k, ef_search)`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this more? Is this saying that users can either specify k or ef_search but not both? How is size affecting the final number of results? Please clarify the last sentence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, Users have to specify k. There are validations which will fails otherwise. On engine level, we cannot pass k and ef_search both. If ef_search is present in method parameters it picks the max of both.

Now as a side effect there will be more results per shard if ef_search value is higher than k. So user will have to use size to get the final results. Thinking more about it we can remove the size sentence, since even right now the final results are controlled by size if k > size

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! I think it's helpful to provide the last sentence. Made a suggestion to include this info.

shatejas marked this conversation as resolved.
Show resolved Hide resolved

#### `nprobes`

kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

You can provide the `nprobes` parameter when searching an index created using the `ivf` method. The `nprobes` parameter specifies to explore the `nprobes` number of clusters to find the top k nearest neighbors. Higher value of `nprobes` improves recall at the cost of increased search latency. The value must be positive.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

The following table provides information about the `nprobes` parameter for the supported engines.

Engine | Notes
:--- | :---
faiss | Overrides the value in index settings if present in search query
shatejas marked this conversation as resolved.
Show resolved Hide resolved

### Using approximate k-NN with filters

Expand Down
Loading