English | 中文
A simple tool that use ImageOptim to compress images in docx/pptx/xlsx documents, and can reduce file size.
Clone this repo first:
git clone https://github.com/cometeme/compress-office.git
Then we need:
- ImageOptim: https://github.com/ImageOptim/ImageOptim
- Python 3.8 (and you need to install
rich
package viapip
) - trash: https://github.com/ali-rantakari/trash
- fd (Optional, can optimize searching speed): https://github.com/sharkdp/fd
If you have Homebrew,run these commands to install dependencies (or install manually):
brew install imageoptim [email protected] trash
pip3.8 install rich
Then start ImageOptim, open "Preferences" menu and modify the settings. You can choose to perform lossless or lossy compression on the pictures in the document. The former can maintain the image quality, while the latter can better reduce size. You can also choose whether to remove EXIF information. EXIF contains informations about the time, gps and camera settings. Removes EXIF in the picture can better protect your privacy.
Run compress-office.py
with files or folders you want to compress. You can pass in multiple paths. If you pass in a directory, the program will traverse in the directory and find files that can be compressed.
python3.8 compress-office.py [path1] [path2] ...
For example:
python3.8 compress-office.py ~/Documents ./test.docx
When program run for the first time, it will create a file called process_history.csv
, which records the path of the compressed file and its modify time. When running again, if program finds that the file has not changed (file path is in the history and its modify time is same as recorded), then the program will skip the file without re-compressing, because doing so is not meaningful. If you really need to recompress a file, just delete its record from csv file.
After the compression is complete, the original document will be moved to recycle bin. Please check your documents before empty the recycle bin.
The program uses python's built-in glob
for file traversal by default, but it is very slow when there are many files, so the program supports the use of fd to speed up the search.
If you have HomeBrew, enter brew install fd
in the console to install fd
.
A1: Yes, but you need to find a substitute for ImageOptim first, such as Trimage
. Then modify the function.py
in this project, and change the cache directory cache_folder
and the compression command compress_command
to make this program work.
A2: The docx/pptx/xlsx document is essentially a zip compressed package, in which the resources are packaged together. This program decompresses the documents passed in by user into a cache directory one by one, use ImageOptim to compress all the pictures in the cache directory, recompresses them, and puts new file back.
Therefore, in theory, using this program will not corrupt files when compressing, but in order to prevent unpredictable bugs, it is recommended to backup files before compress them. At the same time, this program will move the original document to the recycle bin after compression, and if you found a problem, you can restore the original file from the recycle bin.
This is because ImageOptim puts the original pictures in the recycle bin when compressing pictures, and there is no setting to cancel this, so this problem cannot be solved temporarily.