Skip to content

Commit

Permalink
Add documentation changes for disk-based k-NN (#8246)
Browse files Browse the repository at this point in the history
* Add space type as top level

Signed-off-by: John Mazanec <[email protected]>

* Add new rescore parameter

Signed-off-by: John Mazanec <[email protected]>

* Add new rescore parameter

Signed-off-by: John Mazanec <[email protected]>

* add docs for compression and mode

Signed-off-by: John Mazanec <[email protected]>

* Clean up compression docs

Signed-off-by: John Mazanec <[email protected]>

* Doc review

Signed-off-by: Fanit Kolchina <[email protected]>

* Update a few things

Signed-off-by: John Mazanec <[email protected]>

* Doc review

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

---------

Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Fanit Kolchina <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
  • Loading branch information
4 people authored Sep 16, 2024
1 parent 8c74b88 commit 967f257
Show file tree
Hide file tree
Showing 11 changed files with 190 additions and 48 deletions.
1 change: 1 addition & 0 deletions .github/vale/styles/Vocab/OpenSearch/Words/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ p\d{2}
[Rr]eprovision(ed|ing)?
[Rr]erank(er|ed|ing)?
[Rr]epo
[Rr]escor(e|ed|ing)?
[Rr]ewriter
[Rr]ollout
[Rr]ollup
Expand Down
123 changes: 101 additions & 22 deletions _field-types/supported-field-types/knn-vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,23 +22,18 @@ PUT test-index
{
"settings": {
"index": {
"knn": true,
"knn.algo_param.ef_search": 100
"knn": true
}
},
"mappings": {
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 3,
"space_type": "l2",
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "lucene",
"parameters": {
"ef_construction": 128,
"m": 24
}
"engine": "faiss"
}
}
}
Expand All @@ -47,6 +42,92 @@ PUT test-index
```
{% include copy-curl.html %}

## Vector workload modes

Vector search involves trade-offs between low-latency and low-cost search. Specify the `mode` mapping parameter of the `knn_vector` type to indicate which search mode you want to prioritize. The `mode` dictates the default values for k-NN parameters. You can further fine-tune your index by overriding the default parameter values in the k-NN field mapping.

The following modes are currently supported.

| Mode | Default engine | Description |
|:---|:---|:---|
| `in_memory` (Default) | `nmslib` | Prioritizes low-latency search. This mode uses the `nmslib` engine without any quantization applied. It is configured with the default parameter values for vector search in OpenSearch. |
| `on_disk` | `faiss` | Prioritizes low-cost vector search while maintaining strong recall. By default, the `on_disk` mode uses quantization and rescoring to execute a two-pass approach to retrieve the top neighbors. The `on_disk` mode supports only `float` vector types. |

To create a k-NN index that uses the `on_disk` mode for low-cost search, send the following request:

```json
PUT test-index
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 3,
"space_type": "l2",
"mode": "on_disk"
}
}
}
}
```
{% include copy-curl.html %}

## Compression levels

The `compression_level` mapping parameter selects a quantization encoder that reduces vector memory consumption by the given factor. The following table lists the available `compression_level` values.

| Compression level | Supported engines |
|:------------------|:-------------------------------|
| `1x` | `faiss`, `lucene`, and `nmslib` |
| `2x` | `faiss` |
| `4x` | `lucene` |
| `8x` | `faiss` |
| `16x` | `faiss` |
| `32x` | `faiss` |

For example, if a `compression_level` of `32x` is passed for a `float32` index of 768-dimensional vectors, the per-vector memory is reduced from `4 * 768 = 3072` bytes to `3072 / 32 = 846` bytes. Internally, binary quantization (which maps a `float` to a `bit`) may be used to achieve this compression.

If you set the `compression_level` parameter, then you cannot specify an `encoder` in the `method` mapping. Compression levels greater than `1x` are only supported for `float` vector types.
{: .note}

The following table lists the default `compression_level` values for the available workload modes.

| Mode | Default compression level |
|:------------------|:-------------------------------|
| `in_memory` | `1x` |
| `on_disk` | `32x` |


To create a vector field with a `compression_level` of `16x`, specify the `compression_level` parameter in the mappings. This parameter overrides the default compression level for the `on_disk` mode from `32x` to `16x`, producing higher recall and accuracy at the expense of a larger memory footprint:

```json
PUT test-index
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector": {
"type": "knn_vector",
"dimension": 3,
"space_type": "l2",
"mode": "on_disk",
"compression_level": "16x"
}
}
}
}
```
{% include copy-curl.html %}

## Method definitions

[Method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions) are used when the underlying [approximate k-NN]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/) algorithm does not require training. For example, the following `knn_vector` field specifies that *nmslib*'s implementation of *hnsw* should be used for approximate k-NN search. During indexing, *nmslib* will build the corresponding *hnsw* segment files.
Expand All @@ -55,13 +136,13 @@ PUT test-index
"my_vector": {
"type": "knn_vector",
"dimension": 4,
"space_type": "l2",
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "nmslib",
"parameters": {
"ef_construction": 128,
"m": 24
"ef_construction": 100,
"m": 16
}
}
}
Expand All @@ -73,13 +154,15 @@ Model IDs are used when the underlying Approximate k-NN algorithm requires a tra
model contains the information needed to initialize the native library segment files.

```json
"my_vector": {
"type": "knn_vector",
"model_id": "my-model"
}
```

However, if you intend to use Painless scripting or a k-NN score script, you only need to pass the dimension.
```json
"my_vector": {
"type": "knn_vector",
"dimension": 128
}
Expand Down Expand Up @@ -123,13 +206,13 @@ PUT test-index
"type": "knn_vector",
"dimension": 3,
"data_type": "byte",
"space_type": "l2",
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "lucene",
"parameters": {
"ef_construction": 128,
"m": 24
"ef_construction": 100,
"m": 16
}
}
}
Expand Down Expand Up @@ -465,14 +548,10 @@ PUT /test-binary-hnsw
"type": "knn_vector",
"dimension": 8,
"data_type": "binary",
"space_type": "hamming",
"method": {
"name": "hnsw",
"space_type": "hamming",
"engine": "faiss",
"parameters": {
"ef_construction": 128,
"m": 24
}
"engine": "faiss"
}
}
}
Expand Down Expand Up @@ -695,12 +774,12 @@ POST _plugins/_knn/models/test-binary-model/_train
"dimension": 8,
"description": "model with binary data",
"data_type": "binary",
"space_type": "hamming",
"method": {
"name": "ivf",
"engine": "faiss",
"space_type": "hamming",
"parameters": {
"nlist": 1,
"nlist": 16,
"nprobes": 1
}
}
Expand Down
2 changes: 2 additions & 0 deletions _query-dsl/specialized/neural.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ Field | Data type | Required/Optional | Description
`min_score` | Float | Optional | The minimum score threshold for the search results. Only one variable, either `k`, `min_score`, or `max_distance`, can be specified. For more information, see [k-NN radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/).
`max_distance` | Float | Optional | The maximum distance threshold for the search results. Only one variable, either `k`, `min_score`, or `max_distance`, can be specified. For more information, see [k-NN radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/).
`filter` | Object | Optional | A query that can be used to reduce the number of documents considered. For more information about filter usage, see [k-NN search with filters]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/). **Important**: Filter can only be used with the `faiss` or `lucene` engines.
`method_parameters` | Object | Optional | Parameters passed to the k-NN index during search. See [Additional query parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#additional-query-parameters).
`rescore` | Object | Optional | Parameters for configuring rescoring functionality for k-NN indexes built using quantization. See [Rescoring]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#rescoring-quantized-results-using-full-precision).

#### Example request

Expand Down
7 changes: 4 additions & 3 deletions _search-plugins/knn/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ Response field | Description
`timestamp` | The date and time when the model was created.
`description` | A user-provided description of the model.
`error` | An error message explaining why the model is in a failed state.
`space_type` | The space type for which this model is trained, for example, Euclidean or cosine.
`space_type` | The space type for which this model is trained, for example, Euclidean or cosine. Note - this value can be set in the top-level of the request as well
`dimension` | The dimensionality of the vector space for which this model is designed.
`engine` | The native library used to create the model, either `faiss` or `nmslib`.

Expand Down Expand Up @@ -351,6 +351,7 @@ Request parameter | Description
`search_size` | The training data is pulled from the training index using scroll queries. This parameter defines the number of results to return per scroll query. Default is `10000`. Optional.
`description` | A user-provided description of the model. Optional.
`method` | The configuration of the approximate k-NN method used for search operations. For more information about the available methods, see [k-NN index method definitions]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#method-definitions). The method requires training to be valid.
`space_type` | The space type for which this model is trained, for example, Euclidean or cosine. Note: This value can also be set in the `method` parameter.

#### Usage

Expand All @@ -365,10 +366,10 @@ POST /_plugins/_knn/models/{model_id}/_train?preference={node_id}
"max_training_vector_count": 1200,
"search_size": 100,
"description": "My model",
"space_type": "l2",
"method": {
"name":"ivf",
"engine":"faiss",
"space_type": "l2",
"parameters":{
"nlist":128,
"encoder":{
Expand All @@ -395,10 +396,10 @@ POST /_plugins/_knn/models/_train?preference={node_id}
"max_training_vector_count": 1200,
"search_size": 100,
"description": "My model",
"space_type": "l2",
"method": {
"name":"ivf",
"engine":"faiss",
"space_type": "l2",
"parameters":{
"nlist":128,
"encoder":{
Expand Down
72 changes: 69 additions & 3 deletions _search-plugins/knn/approximate-knn.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ PUT my-knn-index-1
"my_vector1": {
"type": "knn_vector",
"dimension": 2,
"space_type": "l2",
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "nmslib",
"parameters": {
"ef_construction": 128,
Expand All @@ -62,9 +62,9 @@ PUT my-knn-index-1
"my_vector2": {
"type": "knn_vector",
"dimension": 4,
"space_type": "innerproduct",
"method": {
"name": "hnsw",
"space_type": "innerproduct",
"engine": "faiss",
"parameters": {
"ef_construction": 256,
Expand Down Expand Up @@ -199,10 +199,10 @@ POST /_plugins/_knn/models/my-model/_train
"training_field": "train-field",
"dimension": 4,
"description": "My model description",
"space_type": "l2",
"method": {
"name": "ivf",
"engine": "faiss",
"space_type": "l2",
"parameters": {
"nlist": 4,
"nprobes": 2
Expand Down Expand Up @@ -308,6 +308,72 @@ Engine | Notes
:--- | :---
`faiss` | If `nprobes` is present in a query, it overrides the value provided when creating the index.

### Rescoring quantized results using full precision

Quantization can be used to significantly reduce the memory footprint of a k-NN index. For more information about quantization, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization). Because some vector representation is lost during quantization, the computed distances will be approximate. This causes the overall recall of the search to decrease.

To improve recall while maintaining the memory savings of quantization, you can use a two-phase search approach. In the first phase, `oversample_factor * k` results are retrieved from an index using quantized vectors and the scores are approximated. In the second phase, the full-precision vectors of those `oversample_factor * k` results are loaded into memory from disk, and scores are recomputed against the full-precision query vector. The results are then reduced to the top k.

The default rescoring behavior is determined by the `mode` and `compression_level` of the backing k-NN vector field:

- For `in_memory` mode, no rescoring is applied by default.
- For `on_disk` mode, default rescoring is based on the configured `compression_level`. Each `compression_level` provides a default `oversample_factor`, specified in the following table.

| Compression level | Default rescore `oversample_factor` |
|:------------------|:----------------------------------|
| `32x` (default) | 3.0 |
| `16x` | 2.0 |
| `8x` | 2.0 |
| `4x` | No default rescoring |
| `2x` | No default rescoring |

To explicitly apply rescoring, provide the `rescore` parameter in a query on a quantized index and specify the `oversample_factor`:

```json
GET my-knn-index-1/_search
{
"size": 2,
"query": {
"knn": {
"target-field": {
"vector": [2, 3, 5, 6],
"k": 2,
"rescore" : {
"oversample_factor": 1.2
}
}
}
}
}
```
{% include copy-curl.html %}

Alternatively, set the `rescore` parameter to `true` to use a default `oversample_factor` of `1.0`:

```json
GET my-knn-index-1/_search
{
"size": 2,
"query": {
"knn": {
"target-field": {
"vector": [2, 3, 5, 6],
"k": 2,
"rescore" : true
}
}
}
}
```
{% include copy-curl.html %}

The `oversample_factor` is a floating-point number between 1.0 and 100.0, inclusive. The number of results in the first pass is calculated as `oversample_factor * k` and is guaranteed to be between 100 and 10,000, inclusive. If the calculated number of results is smaller than 100, then the number of results is set to 100. If the calculated number of results is greater than 10,000, then the number of results is set to 10,000.

Rescoring is only supported for the `faiss` engine.

Rescoring is not needed if quantization is not used because the scores returned are already fully precise.
{: .note}

### Using approximate k-NN with filters

To learn about using filters with k-NN search, see [k-NN search with filters]({{site.url}}{{site.baseurl}}/search-plugins/knn/filter-search-knn/).
Expand Down
Loading

0 comments on commit 967f257

Please sign in to comment.