Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Remedy disk-space leak where
partition_doc()
would leave a copy of each.doc
file passed as a file-like object on disk.Additional Context
partition_doc()
creates a temporary file in which it writes each source-document provided as a file-like object. This file is not deleted and disk consumption grows without bound.The
convert_office_doc()
function used to convert DOC->DOCX uses a command-line program provided with LibreOffice to convert do the conversion. Because this command-line program operates in a different memory space, the source file cannot be passed as an in-memory object and needs to be on the filesystem. When the DOC file is passed as a file-like object, it is written to disk so the conversion program has access to it. It is not deleted afterward.Fix this by writing the temporary source DOC file in the TemporaryDirectory already being used to write the conversion-target DOCX file. That directory is automatically removed when
partition_doc()
completes.