Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support for Faiss byte vector #1659

Closed
naveentatikonda opened this issue Apr 26, 2024 · 8 comments
Closed

[FEATURE] Support for Faiss byte vector #1659

naveentatikonda opened this issue Apr 26, 2024 · 8 comments
Assignees
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement Roadmap:Vector Database/GenAI Project-wide roadmap label v2.17.0

Comments

@naveentatikonda
Copy link
Member

naveentatikonda commented Apr 26, 2024

Is your feature request related to a problem?
For lucene engine we have Lucene byte vector feature, which accepts byte vectors in the range [-128 to 127] providing memory savings upto 75% when compared with fp32 vectors. But, for large scale workloads we usually prefer to use faiss engine and as of today Faiss only supports fp32 and fp16 vectors(using SQfp16). So, adding byte vector support to faiss engine helps to reduce memory requirements especially for those users who are using LLM like Cohere Embed that generates signed int8 embeddings ranging from [-128 to 127].

What solution would you like?
Add a new Faiss ScalarQuantizer like QT_8bit_direct which doesn't require training and quantizes fp32 vector values (within signed byte range and without any precision) into byte sized vectors reducing memory footprints by a factor of 4.
https://faiss.ai/cpp_api/struct/structfaiss_1_1ScalarQuantizer.html

facebookresearch/faiss#3488

@naveentatikonda naveentatikonda added Features Introduces a new unit of functionality that satisfies a requirement enhancement v2.15.0 labels Apr 26, 2024
@naveentatikonda naveentatikonda self-assigned this Apr 26, 2024
@naveentatikonda naveentatikonda moved this from Backlog to 2.15.0 in Vector Search RoadMap Apr 26, 2024
@jmazanec15
Copy link
Member

@naveentatikonda what quantization technique is used?

@naveentatikonda
Copy link
Member Author

@naveentatikonda what quantization technique is used?

Scalar Quantization like SQfp16

@jmazanec15
Copy link
Member

Right, but how do they implement to 8-bit. I dont think they can quantize into fp8 because too much precision would be lost

@naveentatikonda
Copy link
Member Author

Right, but how do they implement to 8-bit. I dont think they can quantize into fp8 because too much precision would be lost

Yes, basically they are serializing fp32 values into uint8(0 to 255) which leads to complete loss of precision when they deserialize it back into float. This feature helps to optimize memory at a cost of recall. Also, if the vector dimension is a multiple of 16 they are processing 16 values in each iteration (unlike 8 values that we have seen with fp16), so I'm hoping it might boost the performance and helps to reduce search latencies.

@jmazanec15
Copy link
Member

jmazanec15 commented May 1, 2024

they are serializing fp32 values into uint8(0 to 255)

But how are they doing this? Would they take 0.2212 -> 0?

@naveentatikonda
Copy link
Member Author

But how are they doing this? Would they take 0.2212 -> 0?

Yes, you are right. For QT_8bit_direct(without training) they are casting float to uint8 to serialize it. That's the reason I mentioned, it leads to complete loss of precision.
https://github.com/facebookresearch/faiss/blob/main/faiss/impl/ScalarQuantizer.cpp#L521-L531

For QT_8bit (non uniform training) and QT_8bit_uniform(uniform training), they are multiplying and dividing it by a factor during encoding and decoding. Might need to run some more tests on all these quantization types using some latest datasets to see if it helps to preserve some precision when compared to QT_8bit_direct. Few months back when I ran some tests using sift and mnist datasets, QT_8bit_direct gave better recall when compared to the other QTypes. But, we shouldn't rely on them because data in those 2 datasets lies within the uint8 range and doesn't have any precision.

@jmazanec15
Copy link
Member

From an interface perspective, I think this should then just be byte vector support for faiss. Otherwise, it may confuse users who expect it to behave like Lucene's 8-bit scalar quantization.

@naveentatikonda naveentatikonda changed the title [FEATURE] Support for Faiss Scalar Quantization uint8 [FEATURE] Support for Faiss byte vector May 30, 2024
@vamshin vamshin moved this from 2.15.0 to 2.17.0 in Vector Search RoadMap Jul 25, 2024
@naveentatikonda
Copy link
Member Author

Why are we streaming vectors in batches from JNI to Faiss for byte vector ?

With recent changes, we are streaming vectors from java to JNI in batches with a batch size which is 1% of JVM heap size. But, for byte vector we still need to break this down and stream in smaller batches of size 1000 from JNI to Faiss to avoid spike in memory consumption because Scalar Quantizer expects the input vectors as float so we need to cast these byte vectors into floats before ingesting into the index.
For example, if we don’t use the batching in JNI for an ideal production cluster which will usually have 32gb of heap(max limit) and 1% of it is around 0.32 gb of vector data in bytes. When this byte vectors are casted to floats in JNI it will spike memory usage to 1.28 gb. But, with a batch size of 1000 even for a 768 dimension vector, we will be just using 3 mb of memory for each batch and reusing it.

Why batch size of 1000 ?

1000 is not a magic number, ran a test comparing a batch size of 1000 vs batch size of 1 and we can clearly see from the RSS metrics (graph shown below) that force merge was longer with batch size of 1 compared to 1000. The merge time with size 1 is 12.7 min and for 1000 it is 8.8 min using Cohere-1M-768D-InnerProduct dataset. We can start with 1000 and bump it up to 10K(30mb extra memory) later if we want to further reduce this latency.

image

@vamshin vamshin added the Roadmap:Vector Database/GenAI Project-wide roadmap label label Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement Roadmap:Vector Database/GenAI Project-wide roadmap label v2.17.0
Projects
Status: 2.17 (First RC 09/03, Release 09/17)
Status: New
Status: Done
Development

No branches or pull requests

3 participants