-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation changes for disk-based k-NN #8246
Changes from all commits
56e911b
373df52
9c2cb0f
dd988f1
876691d
97fcc92
0ff19fc
e613c2e
0f9f081
ec92eea
8984db0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,23 +22,18 @@ PUT test-index | |
{ | ||
"settings": { | ||
"index": { | ||
"knn": true, | ||
"knn.algo_param.ef_search": 100 | ||
"knn": true | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"my_vector": { | ||
"type": "knn_vector", | ||
"dimension": 3, | ||
"space_type": "l2", | ||
"method": { | ||
"name": "hnsw", | ||
"space_type": "l2", | ||
"engine": "lucene", | ||
"parameters": { | ||
"ef_construction": 128, | ||
"m": 24 | ||
} | ||
"engine": "faiss" | ||
} | ||
} | ||
} | ||
|
@@ -47,6 +42,92 @@ PUT test-index | |
``` | ||
{% include copy-curl.html %} | ||
|
||
## Vector workload modes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we have table of mode, compression and which engine will be used in the docs? |
||
|
||
Vector search involves trade-offs between low-latency and low-cost search. Specify the `mode` mapping parameter of the `knn_vector` type to indicate which search mode you want to prioritize. The `mode` dictates the default values for k-NN parameters. You can further fine-tune your index by overriding the default parameter values in the k-NN field mapping. | ||
|
||
The following modes are currently supported. | ||
|
||
| Mode | Default engine | Description | | ||
|:---|:---|:---| | ||
| `in_memory` (Default) | `nmslib` | Prioritizes low-latency search. This mode uses the `nmslib` engine without any quantization applied. It is configured with the default parameter values for vector search in OpenSearch. | | ||
| `on_disk` | `faiss` | Prioritizes low-cost vector search while maintaining strong recall. By default, the `on_disk` mode uses quantization and rescoring to execute a two-pass approach to retrieve the top neighbors. The `on_disk` mode supports only `float` vector types. | | ||
|
||
To create a k-NN index that uses the `on_disk` mode for low-cost search, send the following request: | ||
|
||
```json | ||
PUT test-index | ||
{ | ||
"settings": { | ||
"index": { | ||
"knn": true | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"my_vector": { | ||
"type": "knn_vector", | ||
"dimension": 3, | ||
"space_type": "l2", | ||
"mode": "on_disk" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Compression levels | ||
|
||
The `compression_level` mapping parameter selects a quantization encoder that reduces vector memory consumption by the given factor. The following table lists the available `compression_level` values. | ||
|
||
| Compression level | Supported engines | | ||
|:------------------|:-------------------------------| | ||
| `1x` | `faiss`, `lucene`, and `nmslib` | | ||
| `2x` | `faiss` | | ||
| `4x` | `lucene` | | ||
| `8x` | `faiss` | | ||
| `16x` | `faiss` | | ||
| `32x` | `faiss` | | ||
|
||
For example, if a `compression_level` of `32x` is passed for a `float32` index of 768-dimensional vectors, the per-vector memory is reduced from `4 * 768 = 3072` bytes to `3072 / 32 = 846` bytes. Internally, binary quantization (which maps a `float` to a `bit`) may be used to achieve this compression. | ||
|
||
If you set the `compression_level` parameter, then you cannot specify an `encoder` in the `method` mapping. Compression levels greater than `1x` are only supported for `float` vector types. | ||
{: .note} | ||
|
||
The following table lists the default `compression_level` values for the available workload modes. | ||
|
||
| Mode | Default compression level | | ||
|:------------------|:-------------------------------| | ||
| `in_memory` | `1x` | | ||
| `on_disk` | `32x` | | ||
|
||
|
||
To create a vector field with a `compression_level` of `16x`, specify the `compression_level` parameter in the mappings. This parameter overrides the default compression level for the `on_disk` mode from `32x` to `16x`, producing higher recall and accuracy at the expense of a larger memory footprint: | ||
|
||
```json | ||
PUT test-index | ||
{ | ||
"settings": { | ||
"index": { | ||
"knn": true | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"my_vector": { | ||
"type": "knn_vector", | ||
"dimension": 3, | ||
"space_type": "l2", | ||
"mode": "on_disk", | ||
"compression_level": "16x" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Method definitions | ||
|
||
[Method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions) are used when the underlying [approximate k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) algorithm does not require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* should be used for approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment files. | ||
|
@@ -55,13 +136,13 @@ PUT test-index | |
"my_vector": { | ||
"type": "knn_vector", | ||
"dimension": 4, | ||
"space_type": "l2", | ||
"method": { | ||
"name": "hnsw", | ||
"space_type": "l2", | ||
"engine": "nmslib", | ||
"parameters": { | ||
"ef_construction": 128, | ||
"m": 24 | ||
"ef_construction": 100, | ||
"m": 16 | ||
} | ||
} | ||
} | ||
|
@@ -73,13 +154,15 @@ Model IDs are used when the underlying Approximate k-NN algorithm requires a tra | |
model contains the information needed to initialize the native library segment files. | ||
|
||
```json | ||
"my_vector": { | ||
"type": "knn_vector", | ||
"model_id": "my-model" | ||
} | ||
``` | ||
|
||
However, if you intend to use Painless scripting or a k-NN score script, you only need to pass the dimension. | ||
```json | ||
"my_vector": { | ||
"type": "knn_vector", | ||
"dimension": 128 | ||
} | ||
|
@@ -123,13 +206,13 @@ PUT test-index | |
"type": "knn_vector", | ||
"dimension": 3, | ||
"data_type": "byte", | ||
"space_type": "l2", | ||
"method": { | ||
"name": "hnsw", | ||
"space_type": "l2", | ||
"engine": "lucene", | ||
"parameters": { | ||
"ef_construction": 128, | ||
"m": 24 | ||
"ef_construction": 100, | ||
"m": 16 | ||
} | ||
} | ||
} | ||
|
@@ -465,14 +548,10 @@ PUT /test-binary-hnsw | |
"type": "knn_vector", | ||
"dimension": 8, | ||
"data_type": "binary", | ||
"space_type": "hamming", | ||
"method": { | ||
"name": "hnsw", | ||
"space_type": "hamming", | ||
"engine": "faiss", | ||
"parameters": { | ||
"ef_construction": 128, | ||
"m": 24 | ||
} | ||
"engine": "faiss" | ||
} | ||
} | ||
} | ||
|
@@ -695,12 +774,12 @@ POST _plugins/_knn/models/test-binary-model/_train | |
"dimension": 8, | ||
"description": "model with binary data", | ||
"data_type": "binary", | ||
"space_type": "hamming", | ||
"method": { | ||
"name": "ivf", | ||
"engine": "faiss", | ||
"space_type": "hamming", | ||
"parameters": { | ||
"nlist": 1, | ||
"nlist": 16, | ||
"nprobes": 1 | ||
} | ||
} | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -234,7 +234,7 @@ Response field | Description | |||||
`timestamp` | The date and time when the model was created. | ||||||
`description` | A user-provided description of the model. | ||||||
`error` | An error message explaining why the model is in a failed state. | ||||||
`space_type` | The space type for which this model is trained, for example, Euclidean or cosine. | ||||||
`space_type` | The space type for which this model is trained, for example, Euclidean or cosine. Note - this value can be set in the top-level of the request as well | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
`dimension` | The dimensionality of the vector space for which this model is designed. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Above: Is "in" the top level the right preposition? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed to "at" |
||||||
`engine` | The native library used to create the model, either `faiss` or `nmslib`. | ||||||
|
||||||
|
@@ -351,6 +351,7 @@ Request parameter | Description | |||||
`search_size` | The training data is pulled from the training index using scroll queries. This parameter defines the number of results to return per scroll query. Default is `10000`. Optional. | ||||||
`description` | A user-provided description of the model. Optional. | ||||||
`method` | The configuration of the approximate k-NN method used for search operations. For more information about the available methods, see [k-NN index method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions). The method requires training to be valid. | ||||||
`space_type` | The space type for which this model is trained, for example, Euclidean or cosine. Note: This value can also be set in the `method` parameter. | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Above: Is "in" the right preposition? In what method? |
||||||
#### Usage | ||||||
|
||||||
|
@@ -365,10 +366,10 @@ POST /_plugins/_knn/models/{model_id}/_train?preference={node_id} | |||||
"max_training_vector_count": 1200, | ||||||
"search_size": 100, | ||||||
"description": "My model", | ||||||
"space_type": "l2", | ||||||
"method": { | ||||||
"name":"ivf", | ||||||
"engine":"faiss", | ||||||
"space_type": "l2", | ||||||
"parameters":{ | ||||||
"nlist":128, | ||||||
"encoder":{ | ||||||
|
@@ -395,10 +396,10 @@ POST /_plugins/_knn/models/_train?preference={node_id} | |||||
"max_training_vector_count": 1200, | ||||||
"search_size": 100, | ||||||
"description": "My model", | ||||||
"space_type": "l2", | ||||||
"method": { | ||||||
"name":"ivf", | ||||||
"engine":"faiss", | ||||||
"space_type": "l2", | ||||||
"parameters":{ | ||||||
"nlist":128, | ||||||
"encoder":{ | ||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -49,9 +49,9 @@ PUT my-knn-index-1 | |||||
"my_vector1": { | ||||||
"type": "knn_vector", | ||||||
"dimension": 2, | ||||||
"space_type": "l2", | ||||||
"method": { | ||||||
"name": "hnsw", | ||||||
"space_type": "l2", | ||||||
"engine": "nmslib", | ||||||
"parameters": { | ||||||
"ef_construction": 128, | ||||||
|
@@ -62,9 +62,9 @@ PUT my-knn-index-1 | |||||
"my_vector2": { | ||||||
"type": "knn_vector", | ||||||
"dimension": 4, | ||||||
"space_type": "innerproduct", | ||||||
"method": { | ||||||
"name": "hnsw", | ||||||
"space_type": "innerproduct", | ||||||
"engine": "faiss", | ||||||
"parameters": { | ||||||
"ef_construction": 256, | ||||||
|
@@ -199,10 +199,10 @@ POST /_plugins/_knn/models/my-model/_train | |||||
"training_field": "train-field", | ||||||
"dimension": 4, | ||||||
"description": "My model description", | ||||||
"space_type": "l2", | ||||||
"method": { | ||||||
"name": "ivf", | ||||||
"engine": "faiss", | ||||||
"space_type": "l2", | ||||||
"parameters": { | ||||||
"nlist": 4, | ||||||
"nprobes": 2 | ||||||
|
@@ -308,6 +308,72 @@ Engine | Notes | |||||
:--- | :--- | ||||||
`faiss` | If `nprobes` is present in a query, it overrides the value provided when creating the index. | ||||||
|
||||||
### Rescoring quantized results using full precision | ||||||
|
||||||
Quantization can be used to significantly reduce the memory footprint of a k-NN index. For more information about quantization, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization). Because some vector representation is lost during quantization, the computed distances will be approximate. This causes the overall recall of the search to decrease. | ||||||
|
||||||
To improve recall while maintaining the memory savings of quantization, you can use a two-phase search approach. In the first phase, `oversample_factor * k` results are retrieved from an index using quantized vectors and the scores are approximated. In the second phase, the full-precision vectors of those `oversample_factor * k` results are loaded into memory from disk, and scores are recomputed against the full-precision query vector. The results are then reduced to the top k. | ||||||
|
||||||
The default rescoring behavior is determined by the `mode` and `compression_level` of the backing k-NN vector field: | ||||||
|
||||||
- For `in_memory` mode, no rescoring is applied by default. | ||||||
- For `on_disk` mode, default rescoring is based on the configured `compression_level`. Each `compression_level` provides a default `oversample_factor`, specified in the following table. | ||||||
|
||||||
| Compression level | Default rescore `oversample_factor` | | ||||||
|:------------------|:----------------------------------| | ||||||
| `32x` (default) | 3.0 | | ||||||
| `16x` | 2.0 | | ||||||
| `8x` | 2.0 | | ||||||
| `4x` | No default rescoring | | ||||||
| `2x` | No default rescoring | | ||||||
|
||||||
To explicitly apply rescoring, provide the `rescore` parameter in a query on a quantized index and specify the `oversample_factor`: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```json | ||||||
GET my-knn-index-1/_search | ||||||
{ | ||||||
"size": 2, | ||||||
"query": { | ||||||
"knn": { | ||||||
"target-field": { | ||||||
"vector": [2, 3, 5, 6], | ||||||
"k": 2, | ||||||
"rescore" : { | ||||||
"oversample_factor": 1.2 | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
{% include copy-curl.html %} | ||||||
|
||||||
Alternatively, set the `rescore` parameter to `true` to use a default `oversample_factor` of `1.0`: | ||||||
|
||||||
```json | ||||||
GET my-knn-index-1/_search | ||||||
{ | ||||||
"size": 2, | ||||||
"query": { | ||||||
"knn": { | ||||||
"target-field": { | ||||||
"vector": [2, 3, 5, 6], | ||||||
"k": 2, | ||||||
"rescore" : true | ||||||
} | ||||||
} | ||||||
} | ||||||
} | ||||||
``` | ||||||
{% include copy-curl.html %} | ||||||
|
||||||
The `oversample_factor` is a floating-point number between 1.0 and 100.0, inclusive. The number of results in the first pass is calculated as `oversample_factor * k` and is guaranteed to be between 100 and 10,000, inclusive. If the calculated number of results is smaller than 100, then the number of results is set to 100. If the calculated number of results is greater than 10,000, then the number of results is set to 10,000. | ||||||
|
||||||
Rescoring is only supported for the `faiss` engine. | ||||||
|
||||||
Rescoring is not needed if quantization is not used because the scores returned are already fully precise. | ||||||
{: .note} | ||||||
|
||||||
### Using approximate k-NN with filters | ||||||
|
||||||
To learn about using filters with k-NN search, see [k-NN search with filters]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think on this example we should give a best default experience. Which is no mode, no compression, just spaceType, dim and type attributes. What you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure - only thing is that I believe defaults will be picked up from index_settings in this case.