Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Knn Doc ingestion throws ClassCastException when used with OS plugin implementing IndexStorePlugin #1228

Open
jitendra-titaniam opened this issue Oct 10, 2023 · 6 comments
Labels
bug Something isn't working question Further information is requested

Comments

@jitendra-titaniam
Copy link

What is the bug?
Opensearch IndexStorePlugin allows plugin developers to provide custom implementation of org.apache.lucene.store.Directory
Plugin developer can write their own implementation of Directory.createOutput(String filename, IOContext context) and Directory.openInput(String filename, IOContext context) methods.
The current implementation in KNN80DocValuesConsumer.addKNNBinaryField gives ClassCastException if any other implementation of Directory is used instead of org.apache.lucene.store.FSDirectory.
Further more it does not use the Directory abstraction to createOutput rather writes to the IndexPath directly.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Create a Opensearch Plugin of type IndexStorePlugin. Implement getDirectoryFactories(). For index.store.type of titaniam this method returns an subclass of FsDirectoryFactory that creates custom implementation of Directory rather than an FSDirectory.
  2. Install the plugin and restart Opensearch
  3. Create an index
PUT housing-index1
{
  "settings": {
    "index.knn": true,
    "index.store.type": "titaniam"
  },
  "mappings": {
    "properties": {
      "housing-vector": {
        "type": "knn_vector",
        "dimension": 3
      },
      "title": {
        "type": "text"
      },
      "price": {
        "type": "long"
      },
      "location": {
        "type": "geo_point"
      }
    }
  }
}
  1. Ingest a document
POST housing-index1/_doc
{
  "housing-vector": [
    10,
    20,
    30
  ],
  "title": "2 bedroom in downtown Seattle",
  "price": "2800",
  "location": "47.71, 122.00"
}

Following expection is shown in the opensearch.log

[2023-09-26T20:13:00,327][WARN ][o.o.i.e.Engine           ] [kanchipuram.local] [housing-index1][0] failed engine [refresh failed source[schedule]]
java.lang.ClassCastException: class com.titaniamlabs.lucene.store.TitaniamDirectory cannot be cast to class org.apache.lucene.store.FSDirectory (com.titaniamlabs.lucene.store.TitaniamDirectory is in unnamed module of loader java.net.FactoryURLClassLoader @d5ce97f; org.apache.lucene.store.FSDirectory is in unnamed module of loader 'app')
	at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.addKNNBinaryField(KNN80DocValuesConsumer.java:131) ~[?:?]
	at org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.addBinaryField(KNN80DocValuesConsumer.java:78) ~[?:?]
	at org.apache.lucene.index.BinaryDocValuesWriter.flush(BinaryDocValuesWriter.java:132) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.IndexingChain.writeDocValues(IndexingChain.java:400) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.IndexingChain.flush(IndexingChain.java:258) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:392) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:493) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:672) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:570) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:381) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:355) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:345) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:73) ~[opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:53) ~[opensearch-2.7.0.jar:2.7.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:432) ~[opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:412) ~[opensearch-2.7.0.jar:2.7.0]
	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.apache.lucene.search.ReferenceManager.maybeRefresh(ReferenceManager.java:213) ~[lucene-core-9.5.0.jar:9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59]
	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1846) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.engine.InternalEngine.maybeRefresh(InternalEngine.java:1825) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.shard.IndexShard.scheduledRefresh(IndexShard.java:4172) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.IndexService.maybeRefreshEngine(IndexService.java:983) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.index.IndexService$AsyncRefreshTask.runInternal(IndexService.java:1116) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.common.util.concurrent.AbstractAsyncTask.run(AbstractAsyncTask.java:159) [opensearch-2.7.0.jar:2.7.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:747) [opensearch-2.7.0.jar:2.7.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]

What is the expected behavior?
Knn DocValuesConsumer should use the lucene Directory abstraction to add Binary Field, so that OS plugins implementing IndexStorePlugin can work.

What is your host/environment?
Happening on all envs.

Do you have any screenshots?
NA

Do you have any additional context?

  1. Knn search on Elasticsearch 8.7.0 works with this plugin that implements IndexStorePlugin.

  2. Following is the failing line. It is casting to FSDirectory

indexPath = Paths.get(((FSDirectory) (FilterDirectory.unwrap(state.directory))).getDirectory().toString(), engineFileName)

Following codes then goes on to directly write to lucene file without going through any of the Directory abstractions.

indexCreator = () -> createKNNIndexFromTemplate(model.getModelBlob(), pair, knnEngine, indexPath);

and

These codes has to be resolved to write through Directory.

@jitendra-titaniam jitendra-titaniam added bug Something isn't working untriaged labels Oct 10, 2023
@vamshin
Copy link
Member

vamshin commented Oct 25, 2023

@jitendra-titaniam thanks for raising the issue. Would you be able to contribute the fix?

@jmazanec15
Copy link
Member

So this is tricky. The problem is that the way we access the "native" index files is by directly passing the file path to the native libraries (i.e. faiss and nmslib). This means we are not accessing the index via the IndexInput abstraction, which is returned from the directory. This becomes tricky when using things like remove stores.

Workarounds are kind of tough:

  1. First, we can ensure that the directory being used ends up being an FSDirectory. Im assuming any component implementing IndexStorePlugin will need something locally to search around. The basic idea would be to unwrap until we get there
  2. You could use lucene engine if you need something quicker.

Not sure what best route forward is for long-term solution. Implement native engines by reading from IndexInput would be very challenging if not impossible. We might be able to wrap this in more friendly abstraction that will work better with IndexStorePlugin though.

@vamshin vamshin added the question Further information is requested label Nov 2, 2023
@navneet1v navneet1v moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Apr 1, 2024
@vamshin vamshin moved this from Backlog (Hot) to Now(This Quarter) in Vector Search RoadMap Apr 1, 2024
@vamshin vamshin moved this from Now(This Quarter) to Backlog (Hot) in Vector Search RoadMap Jul 2, 2024
@vamshin vamshin moved this from Backlog (Hot) to Backlog in Vector Search RoadMap Jul 2, 2024
@navneet1v
Copy link
Collaborator

navneet1v commented Aug 12, 2024

@jitendra-titaniam I have created this GH issue for using IndexInput for graph files. I am hoping it can solve the issue you faced: #2033

@github-project-automation github-project-automation bot moved this from Backlog to ✅ Done in Vector Search RoadMap Oct 16, 2024
@jmazanec15 jmazanec15 reopened this Oct 16, 2024
@jmazanec15 jmazanec15 moved this from ✅ Done to 2.18.0 in Vector Search RoadMap Oct 16, 2024
@jmazanec15
Copy link
Member

Will be completed in 2.18 once loading layer changes are completed.

@navneet1v
Copy link
Collaborator

@jitendra-titaniam this is the RFC for loading layer: #2033 please review and see if this can solve your issue. I believe it will

@dblock dblock removed the untriaged label Oct 21, 2024
@dblock
Copy link
Member

dblock commented Oct 21, 2024

[Catch All Triage - 1, 2]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
Status: 2.18.0
Development

No branches or pull requests

5 participants