Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to compute vector similarity scores with the new ValuesSource API #12394

Closed
jpountz opened this issue Jun 25, 2023 · 6 comments · Fixed by #12548
Closed

Add the ability to compute vector similarity scores with the new ValuesSource API #12394

jpountz opened this issue Jun 25, 2023 · 6 comments · Fixed by #12548

Comments

@jpountz
Copy link
Contributor

jpountz commented Jun 25, 2023

Description

#12253 introduced the ability to compute vector similarity with the legacy value source API, let's introduce similar functionality to DoubleValuesSource?

@jpountz
Copy link
Contributor Author

jpountz commented Jul 4, 2023

The two values source APIs are very different, so here's a proposal for new method signatures with the new API: DoubleValues distanceFromQueryVector(float[] queryVector, String vectorField) (for float[] vectors) and DoubleValues distanceFromQueryVector(byte[] queryVector, String vectorField) (for byte[] vectors).

@msokolov
Copy link
Contributor

msokolov commented Jul 5, 2023

The idea makes sense to me, but I don't like the word "distance" in this context because not all of the similarities are distances in the sense of a metric space. That's why I prefer similarity - maybe we could call it vectorSimilarity or similarityToVector or similarityToQueryVector?

@shubhamvishu
Copy link
Contributor

I would like to take this up!

@jpountz Do you mean we could have something like ByteVectorSimilarityValuesSource and FloatVectorSimilarityValuesSource extending DoubleValuesSource in package org.apache.lucene.queries.function.valuesource or org.apache.lucene.search or maybe something else? Could you please elaborate?

I was thinking if we could have a DVS where #getValues returns the per document vector similarity scores(i.e. DoubleValues) based on the passed VectorSimilarityFunction to DVS ?

@jpountz
Copy link
Contributor Author

jpountz commented Jul 10, 2023

They would be pkg-private in org.apache.lucene.search and exposed via factory methods on DoubleValues.

I was thinking if we could have a DVS where #getValues returns the per document vector similarity scores(i.e. DoubleValues) based on the passed VectorSimilarityFunction to DVS ?

FWIW we don't need to take an input similarity function and could rely on the one that is configured on the FieldInfo of the provided field.

@shubhamvishu
Copy link
Contributor

I see....thanks for clarifying @jpountz

@shubhamvishu
Copy link
Contributor

@jpountz I have raised a PR #12548 that adds the required APIs to DVS for computing vector similarity scores. Thanks!

benwtrent pushed a commit that referenced this issue Oct 12, 2023
…12548)

### Description

This PR addresses the issue #12394. It adds an API **`similarityToQueryVector`** to `DoubleValuesSource` to compute vector similarity scores between the query vector and the `KnnByteVectorField`/`KnnFloatVectorField` for documents using the 2 new DVS implementations (`ByteVectorSimilarityValuesSource` for byte vectors and `FloatVectorSimilarityValuesSource` for float vectors). Below are the method signatures added to DVS in this PR:

- `DoubleValues similarityToQueryVector(LeafReaderContext ctx, float[] queryVector, String vectorField)` *(uses ByteVectorSimilarityValuesSource)*
- `DoubleValues similarityToQueryVector(LeafReaderContext ctx, byte[] queryVector, String vectorField)` *(uses FloatVectorSimilarityValuesSource)*

Closes #12394
benwtrent pushed a commit that referenced this issue Oct 12, 2023
…12548)

### Description

This PR addresses the issue #12394. It adds an API **`similarityToQueryVector`** to `DoubleValuesSource` to compute vector similarity scores between the query vector and the `KnnByteVectorField`/`KnnFloatVectorField` for documents using the 2 new DVS implementations (`ByteVectorSimilarityValuesSource` for byte vectors and `FloatVectorSimilarityValuesSource` for float vectors). Below are the method signatures added to DVS in this PR:

- `DoubleValues similarityToQueryVector(LeafReaderContext ctx, float[] queryVector, String vectorField)` *(uses ByteVectorSimilarityValuesSource)*
- `DoubleValues similarityToQueryVector(LeafReaderContext ctx, byte[] queryVector, String vectorField)` *(uses FloatVectorSimilarityValuesSource)*

Closes #12394
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants