[BUG] Cluster crash when using script score #1647

ryanbogan · 2024-04-23T19:22:34Z

What is the bug?
The cluster crashes when the steps below are followed and gives this error:

java.lang.AssertionError
    at org.apache.lucene.codecs.lucene90.IndexedDISI.advance(IndexedDISI.java:452)
    at org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$SparseBinaryDocValues.advance(Lucene90DocValuesProducer.java:725)
    at org.opensearch.knn.index.KNNVectorScriptDocValues.setNextDocId(KNNVectorScriptDocValues.java:31)
    at org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:103)
    at org.opensearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:56)
    at org.opensearch.knn.plugin.script.KNNScoreScript$KNNVectorType.execute(KNNScoreScript.java:140)
    at org.opensearch.common.lucene.search.function.ScriptScoreQuery$ScriptScorable.score(ScriptScoreQuery.java:349)
    at org.apache.lucene.search.TopScoreDocCollector$SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:72)
    at org.apache.lucene.search.FilterLeafCollector.collect(FilterLeafCollector.java:42)
    at org.apache.lucene.search.MatchAllDocsQuery$1$1.score(MatchAllDocsQuery.java:63)
    at org.opensearch.common.lucene.search.function.ScriptScoreQuery$ScriptScoreBulkScorer.score(ScriptScoreQuery.java:390)
    at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71)
    at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
    at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:327)
    at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:283)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552)
    at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:356)
    at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:443)
    at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:427)
    at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:60)
    at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:282)
    at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:155)
    at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:558)
    at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:622)
    at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:591)
    at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
    at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)

How can one reproduce the bug?
Steps to reproduce the behavior:
Complete the following API calls:

PUT /train-index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "train-field": {
        "type": "knn_vector",
        "dimension": 4
      }
    }
  }
}

POST /_bulk?pretty
{ "index": { "_index": "train-index"} }
{ "train-field": [1.5, 5.5, 4.5, 6.4]}
{ "index": { "_index": "train-index" } }
{ "train-field": [2.5, 3.5, 5.6, 6.7]}
{ "index": { "_index": "train-index"} }
{ "train-field": [4.5, 5.5, 6.7, 3.7]}
{ "index": { "_index": "train-index" } }
{ "train-field": [1.5, 5.5, 4.5, 6.4]}

POST /_plugins/_knn/models/my-model/_train
{
  "training_index": "train-index",
  "training_field": "train-field",
  "dimension": 4,
  "description": "My models description",
  "search_size": 500,
  "method": {
    "name": "ivf",
    "engine": "faiss",
    "parameters": {
      "nlist": 4,
      "nprobes": 2
    }
  }
}

PUT /my-knn-index-1
{
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 2
      },
      "my_vector2": {
        "type": "knn_vector",
        "dimension": 4
      },
      "target-field": {
        "type": "knn_vector",
        "model_id": "my-model"
      }
    }
  }
}

POST /_bulk?pretty
{ "index": { "_index": "my-knn-index-1", "_id": "1" } }
{ "my_vector1": [1.5, 2.5], "price": 12.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "2" } }
{ "my_vector1": [2.5, 3.5], "price": 7.1 }
{ "index": { "_index": "my-knn-index-1", "_id": "3" } }
{ "my_vector1": [3.5, 4.5], "price": 12.9 }
{ "index": { "_index": "my-knn-index-1", "_id": "4" } }
{ "my_vector1": [5.5, 6.5], "price": 1.2 }
{ "index": { "_index": "my-knn-index-1", "_id": "5" } }
{ "my_vector1": [4.5, 5.5], "price": 3.7 }
{ "index": { "_index": "my-knn-index-1", "_id": "6" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 }
{ "index": { "_index": "my-knn-index-1", "_id": "7" } }
{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 }
{ "index": { "_index": "my-knn-index-1", "_id": "8" } }
{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 }
{ "index": { "_index": "my-knn-index-1", "_id": "9" } }
{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 }

GET /my-knn-index-1/_search
{
 "size": 4,
 "query": {
   "script_score": {
     "query": {
       "match_all": {}
     },
     "script": {
       "source": "knn_score",
       "lang": "knn",
       "params": {
         "field": "my_vector2",
         "query_value": [2.0, 3.0, 5.0, 6.0],
         "space_type": "cosinesimil"
       }
     }
   }
 }
}

The text was updated successfully, but these errors were encountered:

rishabh6788 · 2024-04-23T20:16:18Z

Are you seeing similar error in OS logs?

Caused by: java.lang.UnsatisfiedLinkError: /home/ec2-user/opensearch/plugins/opensearch-knn/lib/libopensearchknn_faiss.so: /lib64/libm.so.6: version `GLIBC_2
.27' not found (required by /home/ec2-user/opensearch/plugins/opensearch-knn/lib/libopensearchknn_faiss.so)

ryanbogan · 2024-04-23T21:11:34Z

@rishabh6788 That error isn't present in the logs for me.

ryanbogan · 2024-04-26T18:42:57Z

I was able to confirm that it crashes even without a model-based index present

ryanbogan · 2024-04-29T18:54:47Z

I was unable to reproduce the error on 2.13 branch. Initial deep dive suggests that it might be caused by this PR (#1573), and reverting the PR locally fixed the bug. @bugmakerrrrrr we are going to revert this PR for now to fix the bug for 2.14 release

bugmakerrrrrr · 2024-04-30T05:53:10Z

@ryanbogan sorry for introducing this bug. I'll look into this issue later

bugmakerrrrrr · 2024-04-30T06:43:56Z

@ryanbogan I think that I have figured out the root cause. According to the DocIdSetIterator#advance Javadoc:

The behavior of this method is undefined when called with target ≤ current , or after the iterator has exhausted. Both cases may result in unpredicted behavior.

When using script score on the index with some docs without target vector field, these missing field doc ids will be passed to KNNVectorScriptDocValues#setNextDocId and then cause unpredicted behavior. For example, assuming doc [1, 2] has no target field, and doc [3, 4] has target field. When setNextDocId with doc id 1, DocIdSetIterator#advance will be called, and current doc id will be set to 3. After that, when setNextDocId with doc id 2, which is less than current doc id.

I think that we can fix this bug by implementing setNextDocId as following:

public void setNextDocId(int docId) throws IOException {
        int curDocID = vectorValues.docID();
        if (docId > curDocID) {
            curDocID = vectorValues.advance(docId);
        }
        docExists = docId == curDocID;
    }

I will open a PR at a later time.

ryanbogan · 2024-04-30T16:05:50Z

@bugmakerrrrrr No worries, thanks for the insight! I'll add an integration test that covers this exact scenario so we won't run into this error again.

ryanbogan added bug Something isn't working untriaged labels Apr 23, 2024

ryanbogan changed the title ~~[BUG] Cluster crash when using script score with model-based index and normal k-NN indices~~ [BUG] Cluster crash when using script score Apr 26, 2024

ryanbogan mentioned this issue Apr 29, 2024

Revert 'Support script score when doc value is disabled' #1662

Merged

5 tasks

ryanbogan closed this as completed Apr 30, 2024

bugmakerrrrrr mentioned this issue May 9, 2024

Support script score when doc value is disabled and fix misusing DISI #1696

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Cluster crash when using script score #1647

[BUG] Cluster crash when using script score #1647

ryanbogan commented Apr 23, 2024

rishabh6788 commented Apr 23, 2024

ryanbogan commented Apr 23, 2024

ryanbogan commented Apr 26, 2024

ryanbogan commented Apr 29, 2024

bugmakerrrrrr commented Apr 30, 2024

bugmakerrrrrr commented Apr 30, 2024

ryanbogan commented Apr 30, 2024

[BUG] Cluster crash when using script score #1647

[BUG] Cluster crash when using script score #1647

Comments

ryanbogan commented Apr 23, 2024

rishabh6788 commented Apr 23, 2024

ryanbogan commented Apr 23, 2024

ryanbogan commented Apr 26, 2024

ryanbogan commented Apr 29, 2024

bugmakerrrrrr commented Apr 30, 2024

bugmakerrrrrr commented Apr 30, 2024

ryanbogan commented Apr 30, 2024