Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: ingest pipeline with chunking and embedding does not persist data to the embedding step #1892

Closed
ahmetmeleq opened this issue Oct 26, 2023 · 0 comments · Fixed by #1893
Assignees
Labels
bug Something isn't working

Comments

@ahmetmeleq
Copy link
Contributor

Describe the bug
When we run ingest pipeline with chunking options and add an additional pipeline node after chunking (such as embeddings), we see that the element data does not persist to the next pipeline node

To Reproduce
Run unstructured/examples/ingest/s3-small-batch/ingest.sh with additional chunking and embedding cli params

Expected behavior
Embedding outputs are empty, and when elements output of chunking is re-read into memory (to persist it to embeddings) it is an empty list

Environment Info
Base requirements and s3 requirements

@ahmetmeleq ahmetmeleq added the bug Something isn't working label Oct 26, 2023
@ahmetmeleq ahmetmeleq self-assigned this Oct 26, 2023
@ahmetmeleq ahmetmeleq changed the title bug/ingest-pipeline-with-chunking bug/ingest pipeline with chunking and embedding does not persist data to the embedding step Oct 27, 2023
@ahmetmeleq ahmetmeleq changed the title bug/ingest pipeline with chunking and embedding does not persist data to the embedding step bug: ingest pipeline with chunking and embedding does not persist data to the embedding step Oct 27, 2023
github-merge-queue bot pushed a commit that referenced this issue Oct 27, 2023
…data to the embedding step (#1893)

Closes: #1892 (check the issue for more info)
github-merge-queue bot pushed a commit that referenced this issue Nov 29, 2023
Closes #1414
Closes #2039 

This PR:
- Uses Pinecone python cli to implement a destination connector for
Pinecone and provides the ingest readme requirements
[(here)](https://github.com/Unstructured-IO/unstructured/tree/main/unstructured/ingest#the-checklist)
for the connector
- Updates documentation for the s3 destination connector
- Alphabetically sorts setup.py contents
- Updates logs for the chunking node  in ingest pipeline
- Adds a baseline session handle implementation for destination
connectors, to be able to parallelize their operations
- For the
[bug](#1892)
related to persisting element data to ingest embedding nodes; this PR
tests the
[solution](#1893)
with its ingest test
- Solves a bug on ingest chunking params with [bugfix on chunking params
and implementing related
test](69e1949)

---------

Co-authored-by: Roman Isecke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
1 participant