[FEATURE] Lucene Inbuilt Scalar quantizer to convert float 32 bits to 7 bits #1277

vamshin · 2023-10-25T05:18:14Z

Is your feature request related to a problem?
Inbuilt scalar quantizer to convert float 32 bits to 7 bit using lucene engine.
Related to https://forum.opensearch.org/t/byte-size-vector-with-neural-search-on-the-fly/16416/2

What solution would you like?
Inbuilt Byte quantizer to convert float 32 bits to 7 bit.

What alternatives have you considered?
Another approach could be a ingestion processor "Byte quantizer" that takes 32bit float vectors and scalar quantize to Byte vectors

jmazanec15 · 2023-10-25T16:26:58Z

Lucene recently added scalar quantization support inside codec: apache/lucene#12582. Would this solve the use case?

vamshin · 2023-10-25T16:30:02Z

@jmazanec15 thanks for pointing. We should have this exposed from k-NN mappings/index settings or some other way.
Given Lucene already has support, we could prioritize for 2.11 launch

jmazanec15 · 2023-10-25T16:32:39Z

@vamshin its a newer feature. Would require some testing on our end. Pretty interesting though - they found that they could avoid recomputing scalar quantization parameters for every segment while preserving recall.

Galilyou · 2024-03-20T16:25:55Z

looking forward for this one guys; hopefully it makes it to v2.14.0 without having to be pushed to further releases

asfoorial · 2024-03-26T21:13:28Z

Does this feature work with neural-search plugin out-of-box and hassle free? If I understood correctly, this feature will enable the _predict api to return quantized embeddings in, say int8, such that neural-search will automatically understand it and no need for manual quantizations

naveentatikonda · 2024-03-27T05:34:37Z

Does this feature work with neural-search plugin out-of-box and hassle free?

Yes this feature will work out of box with Neural search plugin where you are using an ingest processor to convert the text to embeddings during ingestion. It works the same for neural query as well.

If I understood correctly, this feature will enable the _predict api to return quantized embeddings in, say int8, such that neural-search will automatically understand it and no need for manual quantizations

no this is not how this feature will work. So the way it will be working is, at the lowest level (aka segments), we will quantize the float 32 bit floats to int8. All you would be needing to is create the kNN vector field with right quantizer during index creation. Rest all will be take care. No changes in predict api is required.

asfoorial · 2024-03-27T06:01:44Z

Does this feature work with neural-search plugin out-of-box and hassle free?

Yes this feature will work out of box with Neural search plugin where you are using an ingest processor to convert the text to embeddings during ingestion. It works the same for neural query as well.

If I understood correctly, this feature will enable the _predict api to return quantized embeddings in, say int8, such that neural-search will automatically understand it and no need for manual quantizations

no this is not how this feature will work. So the way it will be working is, at the lowest level (aka segments), we will quantize the float 32 bit floats to int8. All you would be needing to is create the kNN vector field with right quantizer during index creation. Rest all will be take care. No changes in predict api is required.

Thank you @naveentatikonda for the clarification. It is crucial to have neural query work with it seemlessly, otherwise it won't be of much use. Also for indices that don't use neural query, the _predict API will have to produce quantized vectors to avoid manual intermediate quantizations by end users.

In my current usecase, I am using a kNN index and then use the _predict API to generate vectors and also configure a default search pipeline with the same model id used when calling the _predict API. After that the users use neural query to search the index. If neural query does not understand quantized kNN indices and the _predict does not produce quantized vectors then there is no way to how the vectors are quantized and the kNN index won't be easy to search.

You may ask why I am not using ingest pipeline! because it does not support synonym resolution which is crucial in my usecase. I had to do the ingestion from outside to reflect synonyms in the generated vectors.

asfoorial · 2024-07-04T10:40:14Z

This is a great feature and I am looking forward to use it. Is this similar to the binary quantization technique mentioned in https://huggingface.co/blog/embedding-quantization. It can produce 32x compression and maintain accuracy of above 90%.

Below is an example Java code snippet.

public static int[] binarize(float[] vec)
{
int[] bvec = new int[(int)Math.ceil(((float)vec.length)/8)];
int byteIndex = 0;
int bitIndex = 7;
byte byteValue = 0;
for(int i=0;i<vec.length;i++)
{
int bitValue = vec[i]>0?1:0;
byteValue |= bitValue << bitIndex;
if(bitIndex == 0)
{
bvec[byteIndex] = (byteValue &0xff) -128;
byteIndex++;
bitIndex = 7;
byteValue = 0;
}
else
bitIndex--;
}
return bvec;
}

The above is equivalent to the sentence_transformers.quantization.quantize_embeddings function in the below python code.

from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
embeddings = model.encode(["I am driving to the lake.", "It is a beautiful day."])
binary_embeddings = quantize_embeddings(embeddings, precision="binary")

heemin32 · 2024-07-05T02:22:43Z

@asfoorial binary quantization will come in #1779

jhinch · 2024-07-12T23:11:43Z

It seems like the most straight forward way to expose Lucene's built-in scalar quantization within OpenSearch would be to allow for an encoder to configured for the knn_vector field, similar to how sq is exposed for the faiss engine

"method": {
  "name":"hnsw",
  "engine":"lucene",
  "space_type": "l2",
  "parameters":{
    "encoder": {
      "name": "sq"
    },    
    "ef_construction": 256,
    "m": 8
  }
}

The default encoder would be flat which would be the current behaviour.

There are three additional configuration options which are made available by Lucene for use which could possibly also be exposed:

confidenceInterval (float) - This allows for control of the confidence interval during quantization. It allows for two special modes
- null - Indicates that the confidence interval is dependent on the number of dimensions, the interval increasing the higher the number of dimensions. This is the default
- 0 - Indicates that the interval should be dynamically determined based on sampling
bits (int) - the number of bits to use for the quantization (between 1 and 8 inclusively). Defaults to 7
compress (boolean) - controls whether to compress values to a single byte when bits <= 4. Defaults to true

I think if you were to expose additional options, control of the confidenceInterval and bits makes the most sense. compress seems like it could just always be kept as true. The defaults for confidenceInterval and bits seem reasonable and could be used for the defaults for OpenSearch as well when sq is enabled. Below would be my suggested way of exposing it:

"encoder": {
  "name": "sq",
  "parameters": {
    "confidence_interval": "dimension"
  }
}

"encoder": {
  "name": "sq",
  "parameters": {
    "confidence_interval": "dynamic"
  }
}

"encoder": {
  "name": "sq",
  "parameters": {
    "confidence_interval": 0.3
  }
}

"encoder": {
  "name": "sq",
  "parameters": {
    "bits": 7
  }
}

I also think though that this feature could be exposed without exposing the additional configuration as well and just using the Lucene defaults.

naveentatikonda · 2024-07-12T23:45:41Z

It seems like the most straight forward way to expose Lucene's built-in scalar quantization within OpenSearch would be to allow for an encoder to configured for the knn_vector field, similar to how sq is exposed for the faiss engine

yes @jhinch, planning to do something similar to keep it consistent with Faiss UX.

"encoder": {
  "name": "sq",
  "parameters": {
    "confidence_interval": 0.3
  }
}

nit: accepted values for confidenceInterval are null, 0, >=0.9 && <=1.0

Garth-brick · 2024-07-14T21:52:42Z

Currently, if I configure a knn-vector index to have a type of "byte" instead of "float" then do I have to supply byte-quantized vectors or can I supply float32 to OpenSearch and expect it to perform the quantization itself?

naveentatikonda · 2024-07-14T22:41:49Z

Currently, if I configure a knn-vector index to have a type of "byte" instead of "float" then do I have to supply byte-quantized vectors or can I supply float32 to OpenSearch and expect it to perform the quantization itself?

@Garth-brick if you specify data_type as byte then you need to provide byte quantized vectors as input, this is the documentation for your reference.

But, after adding this feature you can provide fp32 vectors and it takes care of the quantization.

vamshin · 2024-07-29T20:20:16Z

@naveentatikonda can we close this issue?

naveentatikonda · 2024-07-29T23:21:29Z

@naveentatikonda can we close this issue?

yes, closing it

vamshin added untriaged enhancement backlog labels Oct 25, 2023

vamshin added this to Vector Search RoadMap Oct 25, 2023

github-project-automation bot moved this to Backlog in Vector Search RoadMap Oct 25, 2023

vamshin added Features Introduces a new unit of functionality that satisfies a requirement and removed untriaged labels Oct 25, 2023

vamshin moved this from Backlog to 2.12.0 in Vector Search RoadMap Oct 25, 2023

vamshin assigned naveentatikonda Oct 25, 2023

vamshin moved this from 2.12.0 to 2.13.0 in Vector Search RoadMap Dec 18, 2023

vamshin added the v2.13.0 label Dec 18, 2023

naveentatikonda mentioned this issue Feb 22, 2024

[DOC] Lucene Inbuilt Byte Quantization opensearch-project/documentation-website#6496

Closed

4 tasks

jmazanec15 added v2.14.0 and removed v2.13.0 labels Mar 15, 2024

naveentatikonda moved this from 2.13.0 to 2.14.0 in Vector Search RoadMap Mar 27, 2024

naveentatikonda added v2.15.0 and removed v2.14.0 labels Apr 23, 2024

naveentatikonda moved this from 2.14.0 to 2.15.0 in Vector Search RoadMap Apr 23, 2024

naveentatikonda added v2.16.0 and removed v2.15.0 labels May 31, 2024

naveentatikonda moved this from 2.15.0 to 2.16.0 in Vector Search RoadMap May 31, 2024

vamshin added k-NN and removed backlog labels Jul 15, 2024

naveentatikonda mentioned this issue Jul 17, 2024

Add support for Lucene inbuilt Scalar Quantizer #1848

Merged

5 tasks

naveentatikonda changed the title ~~[FEATURE] Inbuilt Byte quantizer to convert float 32 bits to 8 bit~~ [FEATURE] Lucene Inbuilt Scalar quantizer to convert float 32 bits to 7 bits Jul 26, 2024

naveentatikonda closed this as completed Jul 29, 2024

github-project-automation bot moved this from 2.16.0 to ✅ Done in Vector Search RoadMap Jul 29, 2024

github-project-automation bot added this to OpenSearch Project Roadmap Aug 30, 2024

github-project-automation bot moved this to 2.16 (First RC 07/23, Release 08/06) in OpenSearch Project Roadmap Aug 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Lucene Inbuilt Scalar quantizer to convert float 32 bits to 7 bits #1277

[FEATURE] Lucene Inbuilt Scalar quantizer to convert float 32 bits to 7 bits #1277

vamshin commented Oct 25, 2023 •

edited by naveentatikonda

Loading

jmazanec15 commented Oct 25, 2023

vamshin commented Oct 25, 2023

jmazanec15 commented Oct 25, 2023

Galilyou commented Mar 20, 2024

asfoorial commented Mar 26, 2024 •

edited

Loading

naveentatikonda commented Mar 27, 2024 •

edited

Loading

asfoorial commented Mar 27, 2024 •

edited

Loading

asfoorial commented Jul 4, 2024 •

edited

Loading

heemin32 commented Jul 5, 2024 •

edited

Loading

jhinch commented Jul 12, 2024

naveentatikonda commented Jul 12, 2024 •

edited

Loading

Garth-brick commented Jul 14, 2024

naveentatikonda commented Jul 14, 2024

vamshin commented Jul 29, 2024

naveentatikonda commented Jul 29, 2024

[FEATURE] Lucene Inbuilt Scalar quantizer to convert float 32 bits to 7 bits #1277

[FEATURE] Lucene Inbuilt Scalar quantizer to convert float 32 bits to 7 bits #1277

Comments

vamshin commented Oct 25, 2023 • edited by naveentatikonda Loading

jmazanec15 commented Oct 25, 2023

vamshin commented Oct 25, 2023

jmazanec15 commented Oct 25, 2023

Galilyou commented Mar 20, 2024

asfoorial commented Mar 26, 2024 • edited Loading

naveentatikonda commented Mar 27, 2024 • edited Loading

asfoorial commented Mar 27, 2024 • edited Loading

asfoorial commented Jul 4, 2024 • edited Loading

heemin32 commented Jul 5, 2024 • edited Loading

jhinch commented Jul 12, 2024

naveentatikonda commented Jul 12, 2024 • edited Loading

Garth-brick commented Jul 14, 2024

naveentatikonda commented Jul 14, 2024

vamshin commented Jul 29, 2024

naveentatikonda commented Jul 29, 2024

vamshin commented Oct 25, 2023 •

edited by naveentatikonda

Loading

asfoorial commented Mar 26, 2024 •

edited

Loading

naveentatikonda commented Mar 27, 2024 •

edited

Loading

asfoorial commented Mar 27, 2024 •

edited

Loading

asfoorial commented Jul 4, 2024 •

edited

Loading

heemin32 commented Jul 5, 2024 •

edited

Loading

naveentatikonda commented Jul 12, 2024 •

edited

Loading