opensearch-project · vagimeli · Apr 16, 2024 · Apr 9, 2024 · Apr 10, 2024 · Apr 10, 2024
@@ -204,7 +204,7 @@ Encoder name | Requires training | Description
 :--- | :--- | :---
 `flat` (Default) | false | Encode vectors as floating-point arrays. This encoding does not reduce memory footprint.
 `pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388).
-`sq` | false | An abbreviation for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM64 architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization).
+`sq` | false | An abbreviation for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM64 architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-16-bit-scalar-quantization).
 
 #### PQ parameters
 
@@ -322,7 +322,7 @@ If you want to use less memory and index faster than HNSW, while maintaining sim
 
 If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop.
 
-You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) in order to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). 
+You can reduce the memory footprint by a factor of 2, with a minimal loss in search quality, by using the [`fp_16` encoder]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#faiss-16-bit-scalar-quantization). If your vector dimensions are within the [-128, 127] byte range, we recommend using the [byte quantizer]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/#lucene-byte-vector) in order to reduce the memory footprint by a factor of 4. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). 
 
 ### Memory estimation
 

@@ -10,22 +10,42 @@
 
 # k-NN vector quantization
 
-By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization.
+By default, the k-NN plugin supports the indexing and querying of vectors of type `float`, where each dimension of the 
+vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors can be
+expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines
+). To reduce the memory footprint, you can use vector quantization.
+
+In OpenSearch, there are many varieties of quantization supported. In general, the level of quantization 
+will provide a tradeoff between the accuracy of the nearest neighbor search and the size of the memory footprint the 
+vector search system will consume. The supported types include: Byte vectors, 16-bit scalar quantization, and 
+Product Quantization (PQ).
 
 ## Lucene byte vector
 
-Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
+Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount
+of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch 
+index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector).
 
-## Faiss scalar quantization 
+## Faiss 16-bit scalar quantization 
 
-Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector values are large compared to the error introduced by eliminating their two least significant bits. When used with [SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing throughput. 
-
-SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop in performance, including decreased indexing throughput and increased search latencies.
+Starting with version 2.13, the k-NN plugin supports performing scalar quantization for the Faiss engine within 
+OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit 
+vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 
+16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the 
+vector values back into 32-bit floating-point values for distance computation. The SQfp16 quantization can decrease the 
+memory footprint by a factor of 2. Additionally, it leads to a minimal loss in recall when differences between vector 
+values are large compared to the error introduced by eliminating their two least significant bits. When used with 
+[SIMD optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), SQfp16 quantization can also significantly reduce search latencies and improve indexing 
+throughput. 
+
+SIMD optimization is not supported on Windows. Using Faiss scalar quantization on Windows can lead to a significant drop
+in performance, including decreased indexing throughput and increased search latencies.
 {: .warning} 
 
 ### Using Faiss scalar quantization
 
-To use Faiss scalar quantization, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a k-NN index:
+To use Faiss scalar quantization, set the k-NN vector field's `method.parameters.encoder.name` to `sq` when creating a 
+k-NN index:
 
 ```json
 PUT /test-index
@@ -60,14 +80,22 @@
 ```
 {% include copy-curl.html %}
 
-Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about `encoder` object parameters, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters).
+Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about `encoder` object 
+parameters, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters).
 
-The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, the preceding request specifies the `clip` parameter. By default, this parameter is `false`, and any vectors containing out-of-range values are rejected. When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will be indexed as a 16-bit vector `[65504.0, -65504.0]`.
+The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must
+be in the [-65504.0, 65504.0] range. To define how to handle out-of-range values, the preceding request specifies the 
+`clip` parameter. By default, this parameter is `false`, and any vectors containing out-of-range values are rejected. 
+When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that 
+they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will 
+be indexed as a 16-bit vector `[65504.0, -65504.0]`.
 
-We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values may cause a drop in recall.
+We recommend setting `clip` to `true` only if very few elements lie outside of the supported range. Rounding the values 
+may cause a drop in recall.
 {: .note}
 
-The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that contains out-of-range vector values (because the `clip` parameter is `false` by default):
+The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that 
+contains out-of-range vector values (because the `clip` parameter is `false` by default):
 
 ```json
 PUT /test-index
@@ -133,15 +161,17 @@
 ```
 {% include copy-curl.html %}
 
-## Memory estimation
+### Memory estimation
 
-In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit vectors require. 
+In the best-case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit 
+vectors require. 
 
 #### HNSW memory estimation
 
 The memory required for HNSW is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector.
 
-As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows:
+As an example, assume that you have 1 million vectors with a dimension of 256 and M of 16. The memory requirement can be
+estimated as follows:
 
 ```bash
 1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB
@@ -151,9 +181,72 @@
 
 The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_vectors) + (4 * nlist * d))` bytes/vector.
 
-As an example, assume that you have 1 million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows:
+As an example, assume that you have 1 million vectors with a dimension of 256 and `nlist` of 128. The memory requirement
+can be estimated as follows:
 
 ```bash
 1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256))  ~= 0.525 GB
 ```
 
+## Faiss product quantization
+
+Product quantization is a technique that allows users to represent a vector in a configurable amount of bits. In 
+general, it can be used to achieve a higher level of compression compared to byte and scalar quantization. Product 
+quantization works by breaking up vectors into _m_ subvectors, and  encoding each subvector with _code_size_ bits. Thus,
+the total amount of memory for the vector ends up being `m*code_size` bits, plus overhead. For more details about the 
+parameters of product quantization, see 
+[PQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#pq-parameters). Product quantization is only 
+supported for the _Faiss_ engine and can be used with either the _HNSW_ or the _IVF_ ANN algorithms.
+
+### Using Faiss product quantization
+
+In order to minimize the loss in accuracy, product quantization requires a _training_ step that builds a model based on 
+the distribution of the data that will be searched over.
+
+Under the hood, the product quantizer is trained by running k-Means clustering on a set of training vectors for each 
+sub-vector space and extracts the centroids to be used for the encoding. The training vectors can either be a subset 
+of the vectors to be ingested, or vectors that have the same distribution and dimension as the vectors to be ingested. 
+In OpenSearch, the training vectors need to be present in an index. In general, the amount of training data will depend 
+on which ANN algorithm will be used and how much data will go into the index. For IVF-based indices, a good number of 
+training vectors to use is `max(1000*nlist, 2^code_size * 1000)`. For HNSW-based indices, a good number is 
+`2^code_size*1000` training vectors. See [Faiss's documentation](https://github.com/facebookresearch/faiss/wiki/FAQ#how-many-training-points-do-i-need-for-k-means) 
+for more details on how these numbers are arrived at.
+
+For product quantization, the two parameters that need to be selected are _m_ and _code_size_. _m_ determines how many 
+sub-vectors the vectors should be broken up into to encode separately - consequently, the _dimension_ needs to be 
+divisible by _m_. _code_size_ determines how many bits each sub-vector will be encoded with. In general, a good place to
+start is setting `code_size = 8` and then tuning _m_ to get the desired tradeoff between memory footprint and recall.
+
+For an example of setting up an index with product quantization, see [this tutorial]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model).
+
+### Memory Estimation
+
+While product quantization is meant to represent individual vectors with `m*code_size` bits, in reality the indices 
+take up more space than this. This is mainly due to the overhead of storing certain code tables and auxilary data 
+structures.
+
+Some of the memory formulas depend on the number of segments present. Typically, this is not known beforehand but a good
+default value is 300.
+{: .note}
+
+#### HNSW memory estimation
+
+The memory required for HNSW with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24 + 8 * hnsw_m) * num_vectors + num_segments * (2^pq_code_size * 4 * d))` bytes.
+
+As an example, assume that you have 1 million vectors with a dimension of 256, `hnsw_m` of 16, `pq_m` of 32, 
+`pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows:
+
+```bash
+1.1*((8 / 8 * 32 + 24 + 8 * 16) * 1000000 + 100 * (2^8 * 4 * 256)) ~= 0.215 GB
+```
+
+#### IVF memory estimation
+
+The memory required for IVF with PQ is estimated to be `1.1*(((pq_code_size / 8) * pq_m + 24) * num_vectors  + num_segments * (2^code_size * 4 * d + 4 * ivf_nlist * d))` bytes.
+
+As an example, assume that you have 1 million vectors with a dimension of 256, `ivf_nlist` of 512, `pq_m` of 32, 
+`pq_code_size` of 8 and 100 segments. The memory requirement can be estimated as follows:
+
+```bash
+1.1*((8 / 8 * 64 + 24) * 1000000  + 100 * (2^8 * 4 * 256 + 4 * 512 * 256))  ~= 0.171 GB
+```