rfctr(docx): extract DocxPartitionerOptions (#3018) · Unstructured-IO/unstructured@12b30d2

Commit

rfctr(docx): extract DocxPartitionerOptions (#3018)

**Reviewers:** Probably easier to review first and second commits
separately as the first one adds all the new code and tests (without
installing it), and the second one installs it into the partitioner
along with the required changes to code and tests.

**Summary**
Enable communication of partitioning options to sub-partitioners, in
particular to the pluggable `PicturePartitioner` coming in a closely
subsequent PR to implement image-extraction and OCR for DOCX, DOC, and
ODT formats.

**Additional Context**
In general, validation of partitioning options as well as assigning
default values and computing derived partitioning settings can be
extracted from partitioners into a neatly encapsulated separate object.
This simplifies the core partitioning code by removing the noise
associated with computing metadata values and deciding how to access the
source document, etc.

However, better factoring aside, having the partition-time "settings"
available in a single object allows partitioning of certain document
features, for example images, to be readily _delegated_ to a
sub-partitioner while still giving it access to all the relevant
partitioning settings for the current document. This is particularly
important when a sub-partitioner is "pluggable" at runtime and must rely
on a clearly-defined (and simple as possible) interface to operate
smoothly.

Loading branch information

scanny authored May 15, 2024

1 parent db186dc commit 12b30d2

CHANGELOG.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,8 +1,8 @@
  
    ## 0.13.8-dev7

    ## 0.13.8-dev8

    ### Enhancements

    **Faster evaluation** Support for concurrent processing of documents during evaluation

    * **Faster evaluation** Support for concurrent processing of documents during evaluation

    ### Features

0 comments on commit `12b30d2`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `12b30d2`

Commit

There are no files selected for viewing

0 comments on commit 12b30d2

0 comments on commit `12b30d2`