From c506bfb8667b2bca05b6a9ecd2bfa94e6184958f Mon Sep 17 00:00:00 2001 From: Roman Isecke Date: Wed, 4 Oct 2023 17:24:42 -0400 Subject: [PATCH] Update Changelog --- CHANGELOG.md | 3 ++- unstructured/__version__.py | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8a1f7b1110..0414ad5133 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,4 +1,4 @@ -## 0.10.19-dev10 +## 0.10.19-dev11 ### Enhancements @@ -9,6 +9,7 @@ * **Adds Table support for the `add_chunking_strategy` decorator to partition functions.** In addition to combining elements under Title elements, user's can now specify the `max_characters=` argument to chunk Table elements into TableChunk elements with `text` and `text_as_html` of length characters. This means partitioned Table results are ready for use in downstream applications without any post processing. * **Expose endpoint url for s3 connectors** By allowing for the endpoint url to be explicitly overwritten, this allows for any non-AWS data providers supporting the s3 protocol to be supported (i.e. minio). * **change default `hi_res` model for pdf/image partition to `yolox`** Now partitioning pdf/image using `hi_res` strategy utilizes `yolox_quantized` model isntead of `detectron2_onnx` model. This new default model has better recall for tables and produces more detailed categories for elements. +* **Refactor of ingest cli workflow** The new approach uses a dynamically set pipeline with a snapshot along each step to save progress and allow to continue from there if an error occurred. Also allows to dynamically set any number of steps to modify the partition content before it gets written to a destination. ### Features diff --git a/unstructured/__version__.py b/unstructured/__version__.py index 3d63527b85..5af4c987f0 100644 --- a/unstructured/__version__.py +++ b/unstructured/__version__.py @@ -1 +1 @@ -__version__ = "0.10.19-dev10" # pragma: no cover +__version__ = "0.10.19-dev11" # pragma: no cover