Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pptx): partition_pptx() accepts strategy arg #2879

Merged
merged 2 commits into from
Apr 11, 2024

Conversation

scanny
Copy link
Collaborator

@scanny scanny commented Apr 11, 2024

Summary
As we move to adding pluggable sub-partitioners, partition_pptx() will need to become sensitive to the strategy argument, in particular when it is set to "hi_res". Up until now there were no expensive operations (inference, OCR, etc.) incurred while partitioning PPTX so this argument was ignored.

After this PR, partition_pptx() still won't do anything with that value, other than pass it along to _PptxPartitionerOptions for safe-keeping, but now its ready for use by a PicturePartitioner (to come in a subsequent PR).

It doesn't do anything with it yet, other than pass it along to
`_PptxPartitionerOptions` for safe-keeping, but now its ready for use by
a `PicturePartitioner` (to come in a subsequent PR).
Comment on lines +53 to +62
*,
file: Optional[IO[bytes]] = None,
date_from_file_object: bool = False,
detect_language_per_element: bool = False,
include_page_breaks: bool = True,
metadata_filename: Optional[str] = None,
metadata_last_modified: Optional[str] = None,
include_slide_notes: bool = False,
infer_table_structure: bool = True,
languages: Optional[list[str]] = ["auto"],
detect_language_per_element: bool = False,
date_from_file_object: bool = False,
metadata_filename: Optional[str] = None,
metadata_last_modified: Optional[str] = None,
Copy link
Collaborator Author

@scanny scanny Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the "*" just makes explicit that all arguments other than filename are keyword-only. This has always been the case because filename and file can't both be used, but explicit is better than implicit. The remaining line changes are just sorting the kwargs alphabetically for the sake of good order, making it easier to find the one you're looking for and tell whether one is missing.

Copy link
Collaborator

@Coniferish Coniferish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm :)

@scanny scanny added this pull request to the merge queue Apr 11, 2024
Merged via the queue into main with commit 2cba949 Apr 11, 2024
42 checks passed
@scanny scanny deleted the scanny/add-strategy-to-pptx branch April 11, 2024 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants