-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Lucene Inbuilt Scalar quantizer to convert float 32 bits to 7 bits #1277
Comments
Lucene recently added scalar quantization support inside codec: apache/lucene#12582. Would this solve the use case? |
@jmazanec15 thanks for pointing. We should have this exposed from k-NN mappings/index settings or some other way. |
@vamshin its a newer feature. Would require some testing on our end. Pretty interesting though - they found that they could avoid recomputing scalar quantization parameters for every segment while preserving recall. |
looking forward for this one guys; hopefully it makes it to v2.14.0 without having to be pushed to further releases |
Does this feature work with neural-search plugin out-of-box and hassle free? If I understood correctly, this feature will enable the _predict api to return quantized embeddings in, say int8, such that neural-search will automatically understand it and no need for manual quantizations |
Yes this feature will work out of box with Neural search plugin where you are using an ingest processor to convert the text to embeddings during ingestion. It works the same for neural query as well.
no this is not how this feature will work. So the way it will be working is, at the lowest level (aka segments), we will quantize the float 32 bit floats to int8. All you would be needing to is create the kNN vector field with right quantizer during index creation. Rest all will be take care. No changes in predict api is required. |
Thank you @naveentatikonda for the clarification. It is crucial to have neural query work with it seemlessly, otherwise it won't be of much use. Also for indices that don't use neural query, the _predict API will have to produce quantized vectors to avoid manual intermediate quantizations by end users. In my current usecase, I am using a kNN index and then use the _predict API to generate vectors and also configure a default search pipeline with the same model id used when calling the _predict API. After that the users use neural query to search the index. If neural query does not understand quantized kNN indices and the _predict does not produce quantized vectors then there is no way to how the vectors are quantized and the kNN index won't be easy to search. You may ask why I am not using ingest pipeline! because it does not support synonym resolution which is crucial in my usecase. I had to do the ingestion from outside to reflect synonyms in the generated vectors. |
This is a great feature and I am looking forward to use it. Is this similar to the binary quantization technique mentioned in https://huggingface.co/blog/embedding-quantization. It can produce 32x compression and maintain accuracy of above 90%. Below is an example Java code snippet. public static int[] binarize(float[] vec) The above is equivalent to the sentence_transformers.quantization.quantize_embeddings function in the below python code. from sentence_transformers import SentenceTransformer |
@asfoorial binary quantization will come in #1779 |
It seems like the most straight forward way to expose Lucene's built-in scalar quantization within OpenSearch would be to allow for an
The default encoder would be There are three additional configuration options which are made available by Lucene for use which could possibly also be exposed:
I think if you were to expose additional options, control of the
I also think though that this feature could be exposed without exposing the additional configuration as well and just using the Lucene defaults. |
yes @jhinch, planning to do something similar to keep it consistent with Faiss UX.
nit: accepted values for |
Currently, if I configure a knn-vector index to have a type of "byte" instead of "float" then do I have to supply byte-quantized vectors or can I supply float32 to OpenSearch and expect it to perform the quantization itself? |
@Garth-brick if you specify But, after adding this feature you can provide fp32 vectors and it takes care of the quantization. |
@naveentatikonda can we close this issue? |
yes, closing it |
Is your feature request related to a problem?
Inbuilt scalar quantizer to convert float 32 bits to 7 bit using lucene engine.
Related to https://forum.opensearch.org/t/byte-size-vector-with-neural-search-on-the-fly/16416/2
What solution would you like?
Inbuilt Byte quantizer to convert float 32 bits to 7 bit.
What alternatives have you considered?
Another approach could be a ingestion processor "Byte quantizer" that takes 32bit float vectors and scalar quantize to Byte vectors
The text was updated successfully, but these errors were encountered: