Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opensearch store does not support object in metadata and fail silently #1589

Closed
yeouchien opened this issue Jun 8, 2023 · 2 comments
Closed

Comments

@yeouchien
Copy link
Contributor

Opensearch bulk indexing is failing silently due to default index mapping

https://github.com/hwchase17/langchainjs/blob/61e89dbaec07b4b50a55efe0f74faec5b2e701fd/langchain/src/vectorstores/opensearch.ts#L177-L186

Code to repro:

  const loader = new PDFLoader('/tmp/pdfloader.pdf');
  const docs = await loader.loadAndSplit(new RecursiveCharacterTextSplitter());
  await OpenSearchVectorStore.fromDocuments(docs, new OpenAIEmbeddings(), {
    client
  });

code above will create the index but unable to index the documents because of the metadata created from RecursiveCharacterTextSplitter
metadata: { source: '/tmp/pdfloader.pdf', pdf: [Object], loc: [Object] }

Workaround for this issue:

delete the object in metadata

  docs.forEach((d) => {
    d.metadata.source = basename(d.metadata.source);
    delete d.metadata.pdf;
    delete d.metadata.loc;
  });

Potential solution:

  1. do not support object in metadata, like pinecone
    https://github.com/hwchase17/langchainjs/blob/61e89dbaec07b4b50a55efe0f74faec5b2e701fd/langchain/src/vectorstores/pinecone.ts#L59
  2. throw appropriate error so that client know what's wrong.
    https://github.com/opensearch-project/opensearch-js/blob/main/guides/bulk.md#handling-errors
@yeouchien yeouchien changed the title Opensearch store does not support object in metadata and failed silently Opensearch store does not support object in metadata and fail silently Jun 8, 2023
@raymondfeng
Copy link

Yes, we ran into the same issue with github document loader.

@dosubot
Copy link

dosubot bot commented Sep 10, 2023

Hi, @yeouchien! I'm Dosu, and I'm helping the langchainjs team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you reported is related to a bug in the Opensearch store in langchainjs. The bug causes the store to fail silently during bulk indexing when objects are included in the metadata. A suggested workaround is to delete the object in the metadata. Raymondfeng also mentioned encountering a similar issue with the GitHub document loader.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the langchainjs repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself. If we don't receive any response within 7 days, the issue will be automatically closed.

Thank you for your understanding and contribution to langchainjs! Let us know if you have any further questions or concerns.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 10, 2023
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2023
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants