Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/ingest-chunking-parameters-not-passed #2039

Closed
ahmetmeleq opened this issue Nov 8, 2023 · 0 comments · Fixed by #1774
Closed

bug/ingest-chunking-parameters-not-passed #2039

ahmetmeleq opened this issue Nov 8, 2023 · 0 comments · Fixed by #1774
Assignees
Labels
bug Something isn't working

Comments

@ahmetmeleq
Copy link
Contributor

Describe the bug
When ingest is being run with chunking parameters, the parameters except --chunk-elements are not successfully passed to the Chunking node, resulting in Chunking node acting in unexpected ways.

To Reproduce
Clone https://github.com/Unstructured-IO/unstructured/tree/ahmet/pinecone-connector, run /test_unstructured_ingest/dest/pinecone.sh with these modifications:

PYTHONPATH=. ./unstructured/ingest/main.py \
  local \
  ...
  --chunk-elements \
  --chunk-max-characters 10 \
  ...
  pinecone \
  ...

Expected behavior

  • Same number of vectors will be upserted each time the test is run, no matter the value of --chunk-max-characters parameter.
  • When the CliChunkingConfig is logged, values will be CliChunkingConfig(chunk_elements=True, multipage_sections=True, combine_text_under_n_chars=500, max_characters=1500) no matter what the provided chunking parameters are.

Environment Info
make install + pip install pinecone-client

@ahmetmeleq ahmetmeleq added the bug Something isn't working label Nov 8, 2023
@ahmetmeleq ahmetmeleq self-assigned this Nov 8, 2023
github-merge-queue bot pushed a commit that referenced this issue Nov 29, 2023
Closes #1414
Closes #2039 

This PR:
- Uses Pinecone python cli to implement a destination connector for
Pinecone and provides the ingest readme requirements
[(here)](https://github.com/Unstructured-IO/unstructured/tree/main/unstructured/ingest#the-checklist)
for the connector
- Updates documentation for the s3 destination connector
- Alphabetically sorts setup.py contents
- Updates logs for the chunking node  in ingest pipeline
- Adds a baseline session handle implementation for destination
connectors, to be able to parallelize their operations
- For the
[bug](#1892)
related to persisting element data to ingest embedding nodes; this PR
tests the
[solution](#1893)
with its ingest test
- Solves a bug on ingest chunking params with [bugfix on chunking params
and implementing related
test](69e1949)

---------

Co-authored-by: Roman Isecke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

1 participant