-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documents are not saved in Opensearch when using textsplitters #2036
Comments
Answer generated by a 🤖 AnswerI understand that you're having trouble saving documents in AWS OpenSearch when using the The issue might be related to the size of the chunks that your documents are being split into. The As a potential solution, I suggest increasing the const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000, // Increase chunkSize
chunkOverlap: 200, // Increase chunkOverlap
}); This modification will split your documents into larger chunks, which should meet the indexing requirements of OpenSearch and allow your documents to be saved. Please note that this is just one potential solution based on the information provided. If this doesn't resolve your issue, or if your experience contradicts this solution, I recommend checking if you're using the latest version of the LangChain framework, as updates often include bug fixes and improvements. If you find that this solution is correct and you believe this is a bug that could impact other users, we encourage you to make a pull request. Your contribution would be greatly appreciated by the LangChain community. SourcesThis response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. |
Hey 🤖 const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
}); |
Ok, after some debugging, I came across this error.
|
Answer generated by a 🤖 AnswerI found related issues that might be helpful. I did my best to summarize the solutions, but I recommend looking at the related issues yourself. Open IssuesOpensearch store does not support object in metadata and fail silentlyThe suggested solution is:
As a workaround until the solution is implemented, you can delete the object in metadata using the following code: docs.forEach((d) => {
d.metadata.source = basename(d.metadata.source);
delete d.metadata.pdf;
delete d.metadata.loc;
}); This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. |
The issue seems to be that OpenSearch can't handle certain types of metadata? I think the bot's response sounds reasonable here - adding a filter step to the vector store itself would be nice too if you feel up for opening a PR? |
I ended up fixing this by adding indexing for |
What's Int? please |
Its Integer |
@nirajmchauhan Would you mind share how did you add indexing? Appreciate the help! Facing same issue. Is there a way to add a general dynamic mapping for metadata that automatically handles this? |
@zhjch05 |
Thank you! |
I am using
CheerioWebBaseLoader
to load the documents and save them in the AWS Opensearch. What I realised is that the documents are not getting saved and there is no error as well. But when I remove theTextSplitter
then the document is getting saved. Here is my code:I cannot go ahead without using
TextSplitter
as if the document is huge, then OpenAI gives 400 errors. Is there a solution to this?The text was updated successfully, but these errors were encountered: