[FEATURE] Support for Faiss byte vector #1659

naveentatikonda · 2024-04-26T22:11:03Z

Is your feature request related to a problem?
For lucene engine we have Lucene byte vector feature, which accepts byte vectors in the range [-128 to 127] providing memory savings upto 75% when compared with fp32 vectors. But, for large scale workloads we usually prefer to use faiss engine and as of today Faiss only supports fp32 and fp16 vectors(using SQfp16). So, adding byte vector support to faiss engine helps to reduce memory requirements especially for those users who are using LLM like Cohere Embed that generates signed int8 embeddings ranging from [-128 to 127].

What solution would you like?
Add a new Faiss ScalarQuantizer like QT_8bit_direct which doesn't require training and quantizes fp32 vector values (within signed byte range and without any precision) into byte sized vectors reducing memory footprints by a factor of 4.
https://faiss.ai/cpp_api/struct/structfaiss_1_1ScalarQuantizer.html

facebookresearch/faiss#3488

jmazanec15 · 2024-05-01T15:14:34Z

@naveentatikonda what quantization technique is used?

naveentatikonda · 2024-05-01T15:56:34Z

@naveentatikonda what quantization technique is used?

Scalar Quantization like SQfp16

jmazanec15 · 2024-05-01T16:23:18Z

Right, but how do they implement to 8-bit. I dont think they can quantize into fp8 because too much precision would be lost

naveentatikonda · 2024-05-01T16:49:05Z

Right, but how do they implement to 8-bit. I dont think they can quantize into fp8 because too much precision would be lost

Yes, basically they are serializing fp32 values into uint8(0 to 255) which leads to complete loss of precision when they deserialize it back into float. This feature helps to optimize memory at a cost of recall. Also, if the vector dimension is a multiple of 16 they are processing 16 values in each iteration (unlike 8 values that we have seen with fp16), so I'm hoping it might boost the performance and helps to reduce search latencies.

jmazanec15 · 2024-05-01T17:14:31Z

they are serializing fp32 values into uint8(0 to 255)

But how are they doing this? Would they take 0.2212 -> 0?

naveentatikonda · 2024-05-01T17:44:44Z

But how are they doing this? Would they take 0.2212 -> 0?

Yes, you are right. For QT_8bit_direct(without training) they are casting float to uint8 to serialize it. That's the reason I mentioned, it leads to complete loss of precision.
https://github.com/facebookresearch/faiss/blob/main/faiss/impl/ScalarQuantizer.cpp#L521-L531

For QT_8bit (non uniform training) and QT_8bit_uniform(uniform training), they are multiplying and dividing it by a factor during encoding and decoding. Might need to run some more tests on all these quantization types using some latest datasets to see if it helps to preserve some precision when compared to QT_8bit_direct. Few months back when I ran some tests using sift and mnist datasets, QT_8bit_direct gave better recall when compared to the other QTypes. But, we shouldn't rely on them because data in those 2 datasets lies within the uint8 range and doesn't have any precision.

jmazanec15 · 2024-05-01T17:52:27Z

From an interface perspective, I think this should then just be byte vector support for faiss. Otherwise, it may confuse users who expect it to behave like Lucene's 8-bit scalar quantization.

naveentatikonda · 2024-08-23T16:09:52Z

Why are we streaming vectors in batches from JNI to Faiss for byte vector ?

With recent changes, we are streaming vectors from java to JNI in batches with a batch size which is 1% of JVM heap size. But, for byte vector we still need to break this down and stream in smaller batches of size 1000 from JNI to Faiss to avoid spike in memory consumption because Scalar Quantizer expects the input vectors as float so we need to cast these byte vectors into floats before ingesting into the index.
For example, if we don’t use the batching in JNI for an ideal production cluster which will usually have 32gb of heap(max limit) and 1% of it is around 0.32 gb of vector data in bytes. When this byte vectors are casted to floats in JNI it will spike memory usage to 1.28 gb. But, with a batch size of 1000 even for a 768 dimension vector, we will be just using 3 mb of memory for each batch and reusing it.

Why batch size of 1000 ?

1000 is not a magic number, ran a test comparing a batch size of 1000 vs batch size of 1 and we can clearly see from the RSS metrics (graph shown below) that force merge was longer with batch size of 1 compared to 1000. The merge time with size 1 is 12.7 min and for 1000 it is 8.8 min using Cohere-1M-768D-InnerProduct dataset. We can start with 1000 and bump it up to 10K(30mb extra memory) later if we want to further reduce this latency.

naveentatikonda added Features Introduces a new unit of functionality that satisfies a requirement enhancement v2.15.0 labels Apr 26, 2024

naveentatikonda self-assigned this Apr 26, 2024

naveentatikonda added this to Vector Search RoadMap Apr 26, 2024

github-project-automation bot moved this to Backlog in Vector Search RoadMap Apr 26, 2024

naveentatikonda moved this from Backlog to 2.15.0 in Vector Search RoadMap Apr 26, 2024

github-actions bot added the untriaged label Apr 26, 2024

naveentatikonda removed the untriaged label Apr 26, 2024

naveentatikonda changed the title ~~[FEATURE] Support for Faiss Scalar Quantization uint8~~ [FEATURE] Support for Faiss byte vector May 30, 2024

naveentatikonda added v2.16.0 and removed v2.15.0 labels May 31, 2024

jmazanec15 mentioned this issue Jun 28, 2024

[RFC] Optimized Disk-Based Vector Search #1779

Closed

This was referenced Jul 5, 2024

Bump faiss commit to 33c0ba5 #1796

Merged

[DOC] Faiss Byte Vector opensearch-project/documentation-website#7661

Closed

naveentatikonda mentioned this issue Jul 12, 2024

Add HNSW changes to support Faiss byte vector #1823

Merged

5 tasks

naveentatikonda added v2.17.0 and removed v2.16.0 labels Jul 21, 2024

vamshin moved this from 2.15.0 to 2.17.0 in Vector Search RoadMap Jul 25, 2024

naveentatikonda mentioned this issue Aug 22, 2024

Add IVF changes to support Faiss byte vector #2002

Merged

3 tasks

vamshin added the Roadmap:Vector Database/GenAI Project-wide roadmap label label Aug 26, 2024

opensearch-infra bot added this to OpenSearch Roadmap Aug 26, 2024

github-project-automation bot moved this to New in OpenSearch Roadmap Aug 26, 2024

github-project-automation bot added this to OpenSearch Project Roadmap Aug 30, 2024

github-project-automation bot moved this to 2.17 (First RC 09/03, Release 09/17) in OpenSearch Project Roadmap Aug 30, 2024

naveentatikonda closed this as completed Aug 30, 2024

github-project-automation bot moved this from 2.17.0 to ✅ Done in Vector Search RoadMap Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support for Faiss byte vector #1659

[FEATURE] Support for Faiss byte vector #1659

naveentatikonda commented Apr 26, 2024 •

edited

Loading

jmazanec15 commented May 1, 2024

naveentatikonda commented May 1, 2024

jmazanec15 commented May 1, 2024

naveentatikonda commented May 1, 2024

jmazanec15 commented May 1, 2024 •

edited

Loading

naveentatikonda commented May 1, 2024

jmazanec15 commented May 1, 2024

naveentatikonda commented Aug 23, 2024

[FEATURE] Support for Faiss byte vector #1659

[FEATURE] Support for Faiss byte vector #1659

Comments

naveentatikonda commented Apr 26, 2024 • edited Loading

jmazanec15 commented May 1, 2024

naveentatikonda commented May 1, 2024

jmazanec15 commented May 1, 2024

naveentatikonda commented May 1, 2024

jmazanec15 commented May 1, 2024 • edited Loading

naveentatikonda commented May 1, 2024

jmazanec15 commented May 1, 2024

naveentatikonda commented Aug 23, 2024

Why are we streaming vectors in batches from JNI to Faiss for byte vector ?

Why batch size of 1000 ?

naveentatikonda commented Apr 26, 2024 •

edited

Loading

jmazanec15 commented May 1, 2024 •

edited

Loading