Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[faiss] Unable to create knn index from model #110

Closed
jmazanec15 opened this issue Oct 3, 2021 · 0 comments
Closed

[faiss] Unable to create knn index from model #110

jmazanec15 opened this issue Oct 3, 2021 · 0 comments

Comments

@jmazanec15
Copy link
Member

When a user wants to create a k-NN index from a model, they need to pass a mapping like:

PUT /target_index
{
  "settings" : {
    "number_of_shards" : 3,
    "number_of_replicas" : 1,
    "index.knn": true
  },
  "mappings": {
       "properties": {
       "target_field": {
           "type": "knn_vector",
           "model_id": "my_model"
      }
   }
  }
}

Internally, we require other information about the model when initializing the index, such as dimension, engine and space type. Not to mention, we need to check if the model exists.

From the KNNVectorFieldMapper, we cannot get that information from the model system index directly. To get around this, we added metadata to the model system index. Then, we retrieve this during mapper creation.

The issue with this approach is that sometimes the cluster state is not available when this build function is called. For instance, sometimes the mapper is created in the same thread where the state is being updated. The resulting error looks something like this:

=== Standard error of node `node{::integTest-0}` ===
?   ? last 40 non error or warning messages from /home/test/k-NN-1/build/testclusters/integTest-0/logs/opensearch.stderr.log ?
? WARNING clustering 1001 points to 128 centroids: please provide at least 4992 training points
?  fatal error in thread [opensearch[integTest-0][clusterApplierService#updateTask][T#1]], exiting
?  java.lang.AssertionError: should not be called by a cluster state applier. reason [the applied cluster state is not yet available]

From my understanding, there is no safe way to access the cluster state from the field mapper builder. For us, that means we cannot get the dimension, space type, and engine (as well as check if the model exists) during index creation. Instead, the next best thing we can do is validate this information during indexing, from the parseCreateField method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant