gather chunks before uploading docs to index for correct ID count #1159

ShellyXueHan · 2024-11-01T23:04:55Z

Motivation and Context

Description

When there are multiple data paths (data_path) specified in the config.json, the upload_documents_to_index function originally executes during each data path, where the ID starts from 0. Therefore, there will always be duplicated IDs between the follow up chunks from the other data paths. This issue was found where some data went missing from the search index created.

Here's an easy way to fix it. Don't upload until all chunks are ready and gathered into one array.

Contribution Checklist

I have built and tested the code locally and in a deployed app
~~For frontend changes, I have pulled the latest code from main, built the frontend, and committed all static files.~~ does not apply
This is a change for all users of this app. No code or asset is specific to my use case or my organization.
I didn't break any existing functionality 😄

gather chunks before uploading docs to index for correct ID

637404c

github-actions bot added the stale label Jan 1, 2025

github-actions bot closed this Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gather chunks before uploading docs to index for correct ID count #1159

gather chunks before uploading docs to index for correct ID count #1159

ShellyXueHan commented Nov 1, 2024 •

edited

Loading

gather chunks before uploading docs to index for correct ID count #1159

gather chunks before uploading docs to index for correct ID count #1159

Conversation

ShellyXueHan commented Nov 1, 2024 • edited Loading

Motivation and Context

Description

Contribution Checklist

ShellyXueHan commented Nov 1, 2024 •

edited

Loading