Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cluster crash when using script score #1647

Closed
ryanbogan opened this issue Apr 23, 2024 · 7 comments
Closed

[BUG] Cluster crash when using script score #1647

ryanbogan opened this issue Apr 23, 2024 · 7 comments
Labels
bug Something isn't working untriaged

Comments

@ryanbogan
Copy link
Member

What is the bug?
The cluster crashes when the steps below are followed and gives this error:

java.lang.AssertionError
    at org.apache.lucene.codecs.lucene90.IndexedDISI.advance(IndexedDISI.java:452)
    at org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$SparseBinaryDocValues.advance(Lucene90DocValuesProducer.java:725)
    at org.opensearch.knn.index.KNNVectorScriptDocValues.setNextDocId(KNNVectorScriptDocValues.java:31)
    at org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:103)
    at org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:56)
    at org.opensearch.knn.plugin.script.KNNScoreScript$KNNVectorType.execute(KNNScoreScript.java:140)
    at org.opensearch.common.lucene.search.function.ScriptScoreQuery$ScriptScorable.score(ScriptScoreQuery.java:349)
    at org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:72)
    at org.apache.lucene.search.FilterLeafCollector.collect(FilterLeafCollector.java:42)
    at org.apache.lucene.search.MatchAllDocsQuery$1$1.score(MatchAllDocsQuery.java:63)
    at org.opensearch.common.lucene.search.function.ScriptScoreQuery$ScriptScoreBulkScorer.score(ScriptScoreQuery.java:390)
    at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71)
    at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
    at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:327)
    at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:283)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552)
    at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:356)
    at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:443)
    at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:427)
    at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:60)
    at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282)
    at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155)
    at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:558)
    at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:622)
    at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:591)
    at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
    at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)

How can one reproduce the bug?
Steps to reproduce the behavior:
Complete the following API calls:

PUT /train-index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "train-field": {
        "type": "knn_vector",
        "dimension": 4
      }
    }
  }
}

POST /_bulk?pretty
{ "index": { "_index": "train-index"} }
{ "train-field": [1.5, 5.5, 4.5, 6.4]}
{ "index": { "_index": "train-index" } }
{ "train-field": [2.5, 3.5, 5.6, 6.7]}
{ "index": { "_index": "train-index"} }
{ "train-field": [4.5, 5.5, 6.7, 3.7]}
{ "index": { "_index": "train-index" } }
{ "train-field": [1.5, 5.5, 4.5, 6.4]}

POST /_plugins/_knn/models/my-model/_train
{
  "training_index": "train-index",
  "training_field": "train-field",
  "dimension": 4,
  "description": "My models description",
  "search_size": 500,
  "method": {
    "name": "ivf",
    "engine": "faiss",
    "parameters": {
      "nlist": 4,
      "nprobes": 2
    }
  }
}

PUT /my-knn-index-1
{
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 2
      },
      "my_vector2": {
        "type": "knn_vector",
        "dimension": 4
      },
      "target-field": {
        "type": "knn_vector",
        "model_id": "my-model"
      }
    }
  }
}

POST /_bulk?pretty
{ "index": { "_index": "my-knn-index-1", "_id": "1" } }
{ "my_vector1": [1.5, 2.5], "price": 12.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "2" } }
{ "my_vector1": [2.5, 3.5], "price": 7.1 }
{ "index": { "_index": "my-knn-index-1", "_id": "3" } }
{ "my_vector1": [3.5, 4.5], "price": 12.9 }
{ "index": { "_index": "my-knn-index-1", "_id": "4" } }
{ "my_vector1": [5.5, 6.5], "price": 1.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "5" } }
{ "my_vector1": [4.5, 5.5], "price": 3.7 }
{ "index": { "_index": "my-knn-index-1", "_id": "6" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 }
{ "index": { "_index": "my-knn-index-1", "_id": "7" } }
{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 }
{ "index": { "_index": "my-knn-index-1", "_id": "8" } }
{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 }
{ "index": { "_index": "my-knn-index-1", "_id": "9" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 }

GET /my-knn-index-1/_search
{
 "size": 4,
 "query": {
   "script_score": {
     "query": {
       "match_all": {}
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "my_vector2",
         "query_value": [2.0, 3.0, 5.0, 6.0],
         "space_type": "cosinesimil"
       }
     }
   }
 }
}
@ryanbogan ryanbogan added bug Something isn't working untriaged labels Apr 23, 2024
@rishabh6788
Copy link

Are you seeing similar error in OS logs?

Caused by: java.lang.UnsatisfiedLinkError: /home/ec2-user/opensearch/plugins/opensearch-knn/lib/libopensearchknn_faiss.so: /lib64/libm.so.6: version `GLIBC_2
.27' not found (required by /home/ec2-user/opensearch/plugins/opensearch-knn/lib/libopensearchknn_faiss.so)

@ryanbogan
Copy link
Member Author

@rishabh6788 That error isn't present in the logs for me.

@ryanbogan ryanbogan changed the title [BUG] Cluster crash when using script score with model-based index and normal k-NN indices [BUG] Cluster crash when using script score Apr 26, 2024
@ryanbogan
Copy link
Member Author

I was able to confirm that it crashes even without a model-based index present

@ryanbogan
Copy link
Member Author

I was unable to reproduce the error on 2.13 branch. Initial deep dive suggests that it might be caused by this PR (#1573), and reverting the PR locally fixed the bug. @bugmakerrrrrr we are going to revert this PR for now to fix the bug for 2.14 release

@bugmakerrrrrr
Copy link
Contributor

@ryanbogan sorry for introducing this bug. I'll look into this issue later

@bugmakerrrrrr
Copy link
Contributor

@ryanbogan I think that I have figured out the root cause. According to the DocIdSetIterator#advance Javadoc:

The behavior of this method is undefined when called with target ≤ current , or after the iterator has exhausted. Both cases may result in unpredicted behavior.

When using script score on the index with some docs without target vector field, these missing field doc ids will be passed to KNNVectorScriptDocValues#setNextDocId and then cause unpredicted behavior. For example, assuming doc [1, 2] has no target field, and doc [3, 4] has target field. When setNextDocId with doc id 1, DocIdSetIterator#advance will be called, and current doc id will be set to 3. After that, when setNextDocId with doc id 2, which is less than current doc id.

I think that we can fix this bug by implementing setNextDocId as following:

public void setNextDocId(int docId) throws IOException {
        int curDocID = vectorValues.docID();
        if (docId > curDocID) {
            curDocID = vectorValues.advance(docId);
        }
        docExists = docId == curDocID;
    }

I will open a PR at a later time.

@ryanbogan
Copy link
Member Author

@bugmakerrrrrr No worries, thanks for the insight! I'll add an integration test that covers this exact scenario so we won't run into this error again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged
Projects
None yet
Development

No branches or pull requests

3 participants