Add support for bitwise inner-product in painless #116082

benwtrent · 2024-11-01T12:41:00Z

This adds bitwise inner product to painless.

The idea here is:

For two bit arrays, which we determine to be a byte array whose dimensions match dense_vector.dim/8, we simply return bitwise &
For a stored bit array (remember, with dense_vector.dim/8 bytes), sum up the provided byte or float array using the bit array as a mask.

This is effectively supporting asynchronous quantization. A prime example of how this works is: https://github.com/cohere-ai/BinaryVectorDB

Basically, you do your initial search against the binary space and then rerank with a differently quantized vector allowing for more information without additional storage space.

closes: #111232

github-actions · 2024-11-01T12:41:11Z

Documentation preview:

✨ Changed pages

elasticsearchmachine · 2024-11-01T12:41:24Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

elasticsearchmachine · 2024-11-01T12:41:24Z

Hi @benwtrent, I've created a changelog YAML for you.

rjernst

A couple thoughts

rjernst · 2024-11-01T13:39:10Z

libs/simdvec/src/main/java/org/elasticsearch/simdvec/ESVectorUtil.java

+        }
+        // tail:
+        for (; i < a.length; i++) {
+            distance += Integer.bitCount((a[i] & b[i]) & 0xFF);


Could the tail be done with a single Long.bitCount call, if using a mask based on the number of remaining bytes?

Possibly? But I didn't want to bother with over optimizing. Especially since these methods are effectively copy-pastes of what exists in Lucene for xor (just changing to &).

libs/simdvec/src/main/java/org/elasticsearch/simdvec/ESVectorUtil.java

john-wagster · 2024-11-04T17:39:55Z

libs/simdvec/src/main/java/org/elasticsearch/simdvec/ESVectorUtil.java

+
+    /**
+     * AND bit count computed over signed bytes.
+     * Copied from Lucene's XOR implementation


This is more so for my education. What's the thinking here for putting this in ES vs Lucene? given that we have XOR in Lucene VectorUtil and this seem complementary.

@john-wagster there is no compelling reason for keeping it out of Lucene. But, its weird for there to be public utility methods in Lucene when nothing directly utilizes it.

john-wagster

LGTM

benwtrent · 2024-11-04T17:50:38Z

@elasticmachine update branch

mayya-sharipova · 2024-11-04T18:39:45Z

docs/reference/vectors/vector-functions.asciidoc

On line 19 we also say that dot_product is not supported for bit vectors.

mayya-sharipova · 2024-11-04T18:41:23Z

docs/reference/vectors/vector-functions.asciidoc

@@ -332,6 +332,9 @@ When using `bit` vectors, not all the vector functions are available. The suppor
 * <<vector-functions-hamming,`hamming`>> – calculates Hamming distance, the sum of the bitwise XOR of the two vectors
 * <<vector-functions-l1,`l1norm`>> – calculates L^1^ distance, this is simply the `hamming` distance
 * <<vector-functions-l2,`l2norm`>> - calculates L^2^ distance, this is the square root of the `hamming` distance
+* <<vector-functions-dot-product,`dotProduct`>> – calculates dot product. When comparing two `bit` vectors,


May be we can add that queryVector can be byte[] (of the same dims as docs or dims *8), or also can be a string, and can be of float[]

mayya-sharipova

@benwtrent Thanks Ben, great change! I've added a small docs comment.

…om:benwtrent/elasticsearch into feature/allow-binary-dotproduct-in-scripts

benwtrent · 2024-11-04T20:44:34Z

@elasticmachine update branch

tteofili

LGTM (with minor comment)

tteofili · 2024-11-05T08:58:48Z

server/src/main/java/org/elasticsearch/script/VectorScoreScriptUtils.java

+                    isFloat = true;
+                }


I think we can break here

We need to build both vectors and then pick the right now, breaking would prevent us from building the arrays.

…y-dotproduct-in-scripts

…om:benwtrent/elasticsearch into feature/allow-binary-dotproduct-in-scripts

…y-dotproduct-in-scripts

elasticsearchmachine · 2024-11-05T22:23:36Z

💚 Backport successful

Status	Branch	Result
✅	8.x

This adds bitwise inner product to painless. The idea here is: - For two bit arrays, which we determine to be a byte array whose dimensions match `dense_vector.dim/8`, we simply return bitwise `&` - For a stored bit array (remember, with `dense_vector.dim/8` bytes), sum up the provided byte or float array using the bit array as a mask. This is effectively supporting asynchronous quantization. A prime example of how this works is: https://github.com/cohere-ai/BinaryVectorDB Basically, you do your initial search against the binary space and then rerank with a differently quantized vector allowing for more information without additional storage space. closes: elastic#111232

…16285) * Add support for bitwise inner-product in painless (#116082) This adds bitwise inner product to painless. The idea here is: - For two bit arrays, which we determine to be a byte array whose dimensions match `dense_vector.dim/8`, we simply return bitwise `&` - For a stored bit array (remember, with `dense_vector.dim/8` bytes), sum up the provided byte or float array using the bit array as a mask. This is effectively supporting asynchronous quantization. A prime example of how this works is: https://github.com/cohere-ai/BinaryVectorDB Basically, you do your initial search against the binary space and then rerank with a differently quantized vector allowing for more information without additional storage space. closes: #111232 * removing unnecessary task adjustment --------- Co-authored-by: Elastic Machine <[email protected]>

This adds bitwise inner product to painless. The idea here is: - For two bit arrays, which we determine to be a byte array whose dimensions match `dense_vector.dim/8`, we simply return bitwise `&` - For a stored bit array (remember, with `dense_vector.dim/8` bytes), sum up the provided byte or float array using the bit array as a mask. This is effectively supporting asynchronous quantization. A prime example of how this works is: https://github.com/cohere-ai/BinaryVectorDB Basically, you do your initial search against the binary space and then rerank with a differently quantized vector allowing for more information without additional storage space. closes: #111232

This adds bitwise inner product to painless. The idea here is: - For two bit arrays, which we determine to be a byte array whose dimensions match `dense_vector.dim/8`, we simply return bitwise `&` - For a stored bit array (remember, with `dense_vector.dim/8` bytes), sum up the provided byte or float array using the bit array as a mask. This is effectively supporting asynchronous quantization. A prime example of how this works is: https://github.com/cohere-ai/BinaryVectorDB Basically, you do your initial search against the binary space and then rerank with a differently quantized vector allowing for more information without additional storage space. closes: elastic#111232

Add support for bitwise inner-product in painless

5c2f974

benwtrent added >enhancement auto-backport Automatically create backport pull requests when merged :Search Relevance/Vectors Vector search v9.0.0 v8.17.0 labels Nov 1, 2024

benwtrent requested a review from a team as a code owner November 1, 2024 12:41

elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Nov 1, 2024

Update docs/changelog/116082.yaml

a196c9b

rjernst reviewed Nov 1, 2024

View reviewed changes

addressing PR comments

2c33bd9

john-wagster reviewed Nov 4, 2024

View reviewed changes

john-wagster approved these changes Nov 4, 2024

View reviewed changes

Merge branch 'main' into feature/allow-binary-dotproduct-in-scripts

7ba48fa

mayya-sharipova reviewed Nov 4, 2024

View reviewed changes

mayya-sharipova approved these changes Nov 4, 2024

View reviewed changes

benwtrent added 2 commits November 4, 2024 15:43

fixing tests and updating docs

d4b95be

Merge branch 'feature/allow-binary-dotproduct-in-scripts' of github.c…

f2d1660

…om:benwtrent/elasticsearch into feature/allow-binary-dotproduct-in-scripts

benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Nov 4, 2024

Merge branch 'main' into feature/allow-binary-dotproduct-in-scripts

c46da24

tteofili approved these changes Nov 5, 2024

View reviewed changes

benwtrent added 3 commits November 5, 2024 14:15

Merge remote-tracking branch 'upstream/main' into feature/allow-binar…

5519af8

…y-dotproduct-in-scripts

adjusting tests and such

808ed5e

formatting

5c23174

benwtrent added 3 commits November 5, 2024 14:16

Merge branch 'feature/allow-binary-dotproduct-in-scripts' of github.c…

2a36775

…om:benwtrent/elasticsearch into feature/allow-binary-dotproduct-in-scripts

fixing tests

14670f7

Merge remote-tracking branch 'upstream/main' into feature/allow-binar…

3732ae4

…y-dotproduct-in-scripts

elasticsearchmachine merged commit d33a03c into elastic:main Nov 5, 2024
16 checks passed

benwtrent deleted the feature/allow-binary-dotproduct-in-scripts branch November 5, 2024 22:22

benwtrent mentioned this pull request Nov 5, 2024

[8.x] Add support for bitwise inner-product in painless (#116082) #116285

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for bitwise inner-product in painless #116082

Add support for bitwise inner-product in painless #116082

benwtrent commented Nov 1, 2024

github-actions bot commented Nov 1, 2024

elasticsearchmachine commented Nov 1, 2024

elasticsearchmachine commented Nov 1, 2024

rjernst left a comment

rjernst Nov 1, 2024

benwtrent Nov 4, 2024

john-wagster Nov 4, 2024

benwtrent Nov 4, 2024

john-wagster left a comment

benwtrent commented Nov 4, 2024

mayya-sharipova Nov 4, 2024

mayya-sharipova Nov 4, 2024

benwtrent Nov 4, 2024

mayya-sharipova left a comment

benwtrent commented Nov 4, 2024

tteofili left a comment

tteofili Nov 5, 2024

benwtrent Nov 5, 2024

elasticsearchmachine commented Nov 5, 2024

Add support for bitwise inner-product in painless #116082

Add support for bitwise inner-product in painless #116082

Conversation

benwtrent commented Nov 1, 2024

github-actions bot commented Nov 1, 2024

elasticsearchmachine commented Nov 1, 2024

elasticsearchmachine commented Nov 1, 2024

rjernst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

john-wagster left a comment

Choose a reason for hiding this comment

benwtrent commented Nov 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mayya-sharipova left a comment

Choose a reason for hiding this comment

benwtrent commented Nov 4, 2024

tteofili left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elasticsearchmachine commented Nov 5, 2024

💚 Backport successful