Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Lucene Inbuilt Scalar quantizer to convert float 32 bits to 7 bits #1277

Closed
vamshin opened this issue Oct 25, 2023 · 15 comments
Closed
Assignees
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement k-NN v2.16.0

Comments

@vamshin
Copy link
Member

vamshin commented Oct 25, 2023

Is your feature request related to a problem?
Inbuilt scalar quantizer to convert float 32 bits to 7 bit using lucene engine.
Related to https://forum.opensearch.org/t/byte-size-vector-with-neural-search-on-the-fly/16416/2

What solution would you like?
Inbuilt Byte quantizer to convert float 32 bits to 7 bit.

What alternatives have you considered?
Another approach could be a ingestion processor "Byte quantizer" that takes 32bit float vectors and scalar quantize to Byte vectors

@vamshin vamshin added Features Introduces a new unit of functionality that satisfies a requirement and removed untriaged labels Oct 25, 2023
@jmazanec15
Copy link
Member

Lucene recently added scalar quantization support inside codec: apache/lucene#12582. Would this solve the use case?

@vamshin
Copy link
Member Author

vamshin commented Oct 25, 2023

@jmazanec15 thanks for pointing. We should have this exposed from k-NN mappings/index settings or some other way.
Given Lucene already has support, we could prioritize for 2.11 launch

@vamshin vamshin moved this from Backlog to 2.12.0 in Vector Search RoadMap Oct 25, 2023
@jmazanec15
Copy link
Member

@vamshin its a newer feature. Would require some testing on our end. Pretty interesting though - they found that they could avoid recomputing scalar quantization parameters for every segment while preserving recall.

@Galilyou
Copy link

looking forward for this one guys; hopefully it makes it to v2.14.0 without having to be pushed to further releases

@asfoorial
Copy link

asfoorial commented Mar 26, 2024

Does this feature work with neural-search plugin out-of-box and hassle free? If I understood correctly, this feature will enable the _predict api to return quantized embeddings in, say int8, such that neural-search will automatically understand it and no need for manual quantizations

@naveentatikonda naveentatikonda moved this from 2.13.0 to 2.14.0 in Vector Search RoadMap Mar 27, 2024
@naveentatikonda
Copy link
Member

naveentatikonda commented Mar 27, 2024

Does this feature work with neural-search plugin out-of-box and hassle free?

Yes this feature will work out of box with Neural search plugin where you are using an ingest processor to convert the text to embeddings during ingestion. It works the same for neural query as well.

If I understood correctly, this feature will enable the _predict api to return quantized embeddings in, say int8, such that neural-search will automatically understand it and no need for manual quantizations

no this is not how this feature will work. So the way it will be working is, at the lowest level (aka segments), we will quantize the float 32 bit floats to int8. All you would be needing to is create the kNN vector field with right quantizer during index creation. Rest all will be take care. No changes in predict api is required.

@asfoorial
Copy link

asfoorial commented Mar 27, 2024

Does this feature work with neural-search plugin out-of-box and hassle free?

Yes this feature will work out of box with Neural search plugin where you are using an ingest processor to convert the text to embeddings during ingestion. It works the same for neural query as well.

If I understood correctly, this feature will enable the _predict api to return quantized embeddings in, say int8, such that neural-search will automatically understand it and no need for manual quantizations

no this is not how this feature will work. So the way it will be working is, at the lowest level (aka segments), we will quantize the float 32 bit floats to int8. All you would be needing to is create the kNN vector field with right quantizer during index creation. Rest all will be take care. No changes in predict api is required.

Thank you @naveentatikonda for the clarification. It is crucial to have neural query work with it seemlessly, otherwise it won't be of much use. Also for indices that don't use neural query, the _predict API will have to produce quantized vectors to avoid manual intermediate quantizations by end users.

In my current usecase, I am using a kNN index and then use the _predict API to generate vectors and also configure a default search pipeline with the same model id used when calling the _predict API. After that the users use neural query to search the index. If neural query does not understand quantized kNN indices and the _predict does not produce quantized vectors then there is no way to how the vectors are quantized and the kNN index won't be easy to search.

You may ask why I am not using ingest pipeline! because it does not support synonym resolution which is crucial in my usecase. I had to do the ingestion from outside to reflect synonyms in the generated vectors.

@naveentatikonda naveentatikonda moved this from 2.14.0 to 2.15.0 in Vector Search RoadMap Apr 23, 2024
@naveentatikonda naveentatikonda moved this from 2.15.0 to 2.16.0 in Vector Search RoadMap May 31, 2024
@asfoorial
Copy link

asfoorial commented Jul 4, 2024

This is a great feature and I am looking forward to use it. Is this similar to the binary quantization technique mentioned in https://huggingface.co/blog/embedding-quantization. It can produce 32x compression and maintain accuracy of above 90%.

Below is an example Java code snippet.

public static int[] binarize(float[] vec)
{
int[] bvec = new int[(int)Math.ceil(((float)vec.length)/8)];
int byteIndex = 0;
int bitIndex = 7;
byte byteValue = 0;
for(int i=0;i<vec.length;i++)
{
int bitValue = vec[i]>0?1:0;
byteValue |= bitValue << bitIndex;
if(bitIndex == 0)
{
bvec[byteIndex] = (byteValue &0xff) -128;
byteIndex++;
bitIndex = 7;
byteValue = 0;
}
else
bitIndex--;
}
return bvec;
}

The above is equivalent to the sentence_transformers.quantization.quantize_embeddings function in the below python code.

from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
embeddings = model.encode(["I am driving to the lake.", "It is a beautiful day."])
binary_embeddings = quantize_embeddings(embeddings, precision="binary")

@heemin32
Copy link
Collaborator

heemin32 commented Jul 5, 2024

@asfoorial binary quantization will come in #1779

@jhinch
Copy link

jhinch commented Jul 12, 2024

It seems like the most straight forward way to expose Lucene's built-in scalar quantization within OpenSearch would be to allow for an encoder to configured for the knn_vector field, similar to how sq is exposed for the faiss engine

"method": {
  "name":"hnsw",
  "engine":"lucene",
  "space_type": "l2",
  "parameters":{
    "encoder": {
      "name": "sq"
    },    
    "ef_construction": 256,
    "m": 8
  }
}

The default encoder would be flat which would be the current behaviour.

There are three additional configuration options which are made available by Lucene for use which could possibly also be exposed:

  • confidenceInterval (float) - This allows for control of the confidence interval during quantization. It allows for two special modes
    • null - Indicates that the confidence interval is dependent on the number of dimensions, the interval increasing the higher the number of dimensions. This is the default
    • 0 - Indicates that the interval should be dynamically determined based on sampling
  • bits (int) - the number of bits to use for the quantization (between 1 and 8 inclusively). Defaults to 7
  • compress (boolean) - controls whether to compress values to a single byte when bits <= 4. Defaults to true

I think if you were to expose additional options, control of the confidenceInterval and bits makes the most sense. compress seems like it could just always be kept as true. The defaults for confidenceInterval and bits seem reasonable and could be used for the defaults for OpenSearch as well when sq is enabled. Below would be my suggested way of exposing it:

"encoder": {
  "name": "sq",
  "parameters": {
    "confidence_interval": "dimension"
  }
}
"encoder": {
  "name": "sq",
  "parameters": {
    "confidence_interval": "dynamic"
  }
}
"encoder": {
  "name": "sq",
  "parameters": {
    "confidence_interval": 0.3
  }
}
"encoder": {
  "name": "sq",
  "parameters": {
    "bits": 7
  }
}

I also think though that this feature could be exposed without exposing the additional configuration as well and just using the Lucene defaults.

@naveentatikonda
Copy link
Member

naveentatikonda commented Jul 12, 2024

It seems like the most straight forward way to expose Lucene's built-in scalar quantization within OpenSearch would be to allow for an encoder to configured for the knn_vector field, similar to how sq is exposed for the faiss engine

yes @jhinch, planning to do something similar to keep it consistent with Faiss UX.

"encoder": {
  "name": "sq",
  "parameters": {
    "confidence_interval": 0.3
  }
}

nit: accepted values for confidenceInterval are null, 0, >=0.9 && <=1.0

@Garth-brick
Copy link

Currently, if I configure a knn-vector index to have a type of "byte" instead of "float" then do I have to supply byte-quantized vectors or can I supply float32 to OpenSearch and expect it to perform the quantization itself?

@naveentatikonda
Copy link
Member

Currently, if I configure a knn-vector index to have a type of "byte" instead of "float" then do I have to supply byte-quantized vectors or can I supply float32 to OpenSearch and expect it to perform the quantization itself?

@Garth-brick if you specify data_type as byte then you need to provide byte quantized vectors as input, this is the documentation for your reference.

But, after adding this feature you can provide fp32 vectors and it takes care of the quantization.

@vamshin vamshin added k-NN and removed backlog labels Jul 15, 2024
@naveentatikonda naveentatikonda changed the title [FEATURE] Inbuilt Byte quantizer to convert float 32 bits to 8 bit [FEATURE] Lucene Inbuilt Scalar quantizer to convert float 32 bits to 7 bits Jul 26, 2024
@vamshin
Copy link
Member Author

vamshin commented Jul 29, 2024

@naveentatikonda can we close this issue?

@naveentatikonda
Copy link
Member

@naveentatikonda can we close this issue?

yes, closing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement k-NN v2.16.0
Projects
Status: 2.16 (First RC 07/23, Release 08/06)
Status: Done
Development

No branches or pull requests

8 participants