Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] k-NN Array support for Vector Field type #675

Closed
mausch opened this issue Dec 13, 2022 · 5 comments
Closed

[Feature] k-NN Array support for Vector Field type #675

mausch opened this issue Dec 13, 2022 · 5 comments
Assignees
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement untriaged

Comments

@mausch
Copy link

mausch commented Dec 13, 2022

In OpenSearch it's expected since forever to be able to pass an array of values to any field.
In fact the documentation says "you can pass an array of values into any field."
However when you try to do that for fields of type knn_vector you get a mapper_parsing_exception

@navneet1v
Copy link
Collaborator

@mausch Can you please provide details about your use-case?

@navneet1v navneet1v assigned navneet1v and vamshin and unassigned navneet1v Jan 4, 2023
@mausch
Copy link
Author

mausch commented Jan 4, 2023

A schema where every document has 1..N vectors, plus several other regular OpenSearch fields.
I need to run a kNN search on all vectors, defining the score of the document as the highest similarity of all its vectors against the target vector. Hopefully that makes sense 🙂

At the moment I'm working around things by representing every entity as 1..N documents, but this means I have to duplicate all the other regular fields, and also I have to overspecify the result size.

@vamshin vamshin added Features Introduces a new unit of functionality that satisfies a requirement v2.7.0 and removed untriaged labels Feb 10, 2023
@vamshin vamshin changed the title [FEATURE] Array support [FEATURE] Array support for k-NN Vector Field type Feb 23, 2023
@vamshin vamshin changed the title [FEATURE] Array support for k-NN Vector Field type k-NN Array support for Vector Field type Feb 23, 2023
@vamshin vamshin changed the title k-NN Array support for Vector Field type [Feature] k-NN Array support for Vector Field type Feb 23, 2023
@naveentatikonda
Copy link
Member

naveentatikonda commented Mar 14, 2023

@mausch For now, we are not able to add Array support for k-NN Vector Field Type due to the underlying limitation with lucene. By default, knn_vector field supports Binary Doc Values. But, lucene is not supporting Multi BinaryDocValues where the BinaryDocValuesWriter in lucene doesn’t support adding binary values(bytes) of multiple vectors into the same document which throws IllegalArgumentException.

For your use case, we recommend using Nested Field Type as it is supported for ANN Search.

For Example :

PUT train-index
{
    "settings": {
        "index": {
            "knn": true,
            "knn.algo_param.ef_search": 100
        }
    },
    "mappings": {
        "properties": {
            "nested_field": {
                "type": "nested",
                "properties": {
                    "my_vector1": {
                        "type": "knn_vector",
                        "dimension": 3,
                        "method": {
                            "name": "hnsw",
                            "space_type": "l2",
                            "engine": "nmslib",
                            "parameters": {
                                "ef_construction": 128,
                                "m": 24
                            }
                        }
                    }
                }
            }
        }
    }
}
POST train-index/_doc/1
{
    "nested_field" : [
        {
            "my_vector1": [400,50,665],
             "test-field1" : 1
        },
        {
            "my_vector1": [47,589,64],
             "test-field1" : 10
        },
        {
            "my_vector1": [434,52,678],
             "test-field1" : 15
        }
    ] 
}

Please let us know if you have any other questions.

@mausch
Copy link
Author

mausch commented Mar 17, 2023

Ah, forgot about nested fields! That's a great workaround, thanks 👍

@heemin32
Copy link
Collaborator

For the reference, query will look like

GET train-index/_search
{
  "query": {
    "nested": {
      "path": "nested_field",
      "query": {
        "knn": {
          "nested_field.my_vector1": {
            "vector": [400,50,665],
            "k": 2
          }
        }
      }
    }
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement untriaged
Projects
None yet
Development

No branches or pull requests

5 participants