Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gather chunks before uploading docs to index for correct ID count #1159

Closed
wants to merge 1 commit into from

Conversation

ShellyXueHan
Copy link

@ShellyXueHan ShellyXueHan commented Nov 1, 2024

Motivation and Context

Description

When there are multiple data paths (data_path) specified in the config.json, the upload_documents_to_index function originally executes during each data path, where the ID starts from 0. Therefore, there will always be duplicated IDs between the follow up chunks from the other data paths. This issue was found where some data went missing from the search index created.

Here's an easy way to fix it. Don't upload until all chunks are ready and gathered into one array.

Contribution Checklist

  • I have built and tested the code locally and in a deployed app
  • For frontend changes, I have pulled the latest code from main, built the frontend, and committed all static files. does not apply
  • This is a change for all users of this app. No code or asset is specific to my use case or my organization.
  • I didn't break any existing functionality 😄

@github-actions github-actions bot added the stale label Jan 1, 2025
@github-actions github-actions bot closed this Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant