Skip to content

Commit

Permalink
Document ability to remove all OCR
Browse files Browse the repository at this point in the history
  • Loading branch information
jbarlow83 committed Jan 24, 2024
1 parent 75bf8e4 commit cca04fd
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions docs/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,20 @@ if all you want to is to apply image processing or PDF/A conversion.
the case. Use ``--tesseract-non-ocr-timeout`` to control the timeout
for non-OCR operations, if needed.

Remove all text or OCR from my PDF
----------------------------------

This is getting ridiculous, but OCRmyPDF can complete strip all textual
information from a PDF and reconstruct it as a "bag of images" PDF.

.. code-block::
ocrmypdf --tesseract-timeout 0 --force-ocr input.pdf output.pdf
Why would you want to do this? Perhaps you have a PDF where OCR
fails to produce useful results, and just want to get rid of all OCR information.
This command also removes OCR generated by third party tools.

Optimize images without performing OCR
--------------------------------------

Expand Down

0 comments on commit cca04fd

Please sign in to comment.