Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build(deps): update unstructured requirement from <0.12 to <0.15 #91

Merged
merged 1 commit into from
Jun 7, 2024

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github May 27, 2024

Updates the requirements on unstructured to permit the latest version.

Release notes

Sourced from unstructured's releases.

0.14.2

Enhancements

  • Bump unstructured-inference==0.7.33.

Features

  • Add attribution to the pinecone connector.
Changelog

Sourced from unstructured's changelog.

0.14.2

Enhancements

  • Bump unstructured-inference==0.7.33.

Features

  • Add attribution to the pinecone connector.

Fixes

0.14.1

Enhancements

  • Refactor code related to embedded text extraction. The embedded text extraction code is moved from unstructured-inference to unstructured.

Features

  • Large improvements to the ingest process:
    • Support for multiprocessing and async, with limits for both.
    • Streamlined to process when mapping CLI invocations to the underlying code
    • More granular steps introduced to give better control over process (i.e. dedicated step to uncompress files already in the local filesystem, new optional staging step before upload)
    • Use the python client when calling the unstructured api for partitioning or chunking
    • Saving the final content is now a dedicated destination connector (local) set as the default if none are provided. Avoids adding new files locally if uploading elsewhere.
    • Leverage last modified date when deciding if new files should be downloaded and reprocessed.
    • Add attribution to the pinecone connector
    • Add support for Python 3.12. unstructured now works with Python 3.12!

Fixes

0.14.0

BREAKING CHANGES

  • Turn table extraction for PDFs and images off by default. Reverting the default behavior for table extraction to "off" for PDFs and images. A number of users didn't realize we made the change and were impacted by slower processing times due to the extra model call for table extraction.

Enhancements

  • Skip unnecessary element sorting in partition_pdf(). Skip element sorting when determining whether embedded text can be extracted.
  • Faster evaluation Support for concurrent processing of documents during evaluation
  • Add strategy parameter to partition_docx(). Behavior of future enhancements may be sensitive the partitioning strategy. Add this parameter so partition_docx() is aware of the requested strategy.
  • Add GLOBAL_WORKING_DIR and GLOBAL_WORKING_PROCESS_DIR configuration parameteres to control temporary storage.

Features

  • Add form extraction basics (document elements and placeholder code in partition). This is to lay the ground work for the future. Form extraction models are not currently available in the library. An attempt to use this functionality will end in a NotImplementedError.

Fixes

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [unstructured](https://github.com/Unstructured-IO/unstructured) to permit the latest version.
- [Release notes](https://github.com/Unstructured-IO/unstructured/releases)
- [Changelog](https://github.com/Unstructured-IO/unstructured/blob/main/CHANGELOG.md)
- [Commits](Unstructured-IO/unstructured@0.2.0...0.14.2)

---
updated-dependencies:
- dependency-name: unstructured
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels May 27, 2024
Copy link

codecov bot commented May 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.60%. Comparing base (7be9cb5) to head (fa1e8f4).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #91   +/-   ##
=======================================
  Coverage   89.60%   89.60%           
=======================================
  Files          22       22           
  Lines         760      760           
=======================================
  Hits          681      681           
  Misses         79       79           
Flag Coverage Δ
main 75.39% <ø> (ø)
ngr 63.28% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@amotl amotl merged commit b56222a into main Jun 7, 2024
6 checks passed
@amotl amotl deleted the dependabot/pip/unstructured-lt-0.15 branch June 7, 2024 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant