Skip to content

Commit

Permalink
rfctr(docx): extract DocxPartitionerOptions (#3018)
Browse files Browse the repository at this point in the history
**Reviewers:** Probably easier to review first and second commits
separately as the first one adds all the new code and tests (without
installing it), and the second one installs it into the partitioner
along with the required changes to code and tests.

**Summary**
Enable communication of partitioning options to sub-partitioners, in
particular to the pluggable `PicturePartitioner` coming in a closely
subsequent PR to implement image-extraction and OCR for DOCX, DOC, and
ODT formats.

**Additional Context**
In general, validation of partitioning options as well as assigning
default values and computing derived partitioning settings can be
extracted from partitioners into a neatly encapsulated separate object.
This simplifies the core partitioning code by removing the noise
associated with computing metadata values and deciding how to access the
source document, etc.

However, better factoring aside, having the partition-time "settings"
available in a single object allows partitioning of certain document
features, for example images, to be readily _delegated_ to a
sub-partitioner while still giving it access to all the relevant
partitioning settings for the current document. This is particularly
important when a sub-partitioner is "pluggable" at runtime and must rely
on a clearly-defined (and simple as possible) interface to operate
smoothly.
  • Loading branch information
scanny authored May 15, 2024
1 parent db186dc commit 12b30d2
Show file tree
Hide file tree
Showing 4 changed files with 543 additions and 200 deletions.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
## 0.13.8-dev7
## 0.13.8-dev8

### Enhancements

**Faster evaluation** Support for concurrent processing of documents during evaluation
* **Faster evaluation** Support for concurrent processing of documents during evaluation

### Features

Expand Down
Loading

0 comments on commit 12b30d2

Please sign in to comment.