GitHub - ashipunov/img2djvu: Single-pass DjVu encoder based on DjVu Libre, minidjvu and ImageMagick

ashipunov / img2djvu Public

Notifications You must be signed in to change notification settings
Fork 3
Star 26

Single-pass DjVu encoder based on DjVu Libre, minidjvu and ImageMagick

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
NEWS		NEWS
README		README
TODO		TODO
img2djvu		img2djvu

Repository files navigation

This is a single-pass DjVu encoder based on DjVu Libre and ImageMagick,
it requires bash and getopt, optionally also tifftopnm, minidjvu and
ocrodjvu plus cuneiform

img2djvu is in a public domain. Many thanks to members of
forum.ru-board.com for ideas, suggestions and corrections, especially to
U235, anagnost96 and monday2000.

===

SIMPLE USE

> img2djvu out

(where "out" is a name of folder wich contains some images)

img2djvu will output some diagnostics: first, total number of files in
folder, then will show processing status of every file. Skipped
non-graphic files are labeled with dots (.), graphic files are labeled
with angle brackets (<>). Black and white pages are labeled with B or M
(if minidjvu used), color pages with C, F (if cpaldjvu used) or L (if
layer separation used). OCR labeled as an additional layer (+1) or
additional plus (+). Aggressivity is the number of enclosed brackets
(e.g., <<<>>> for -a 2).

===

ADVANCED USE

> img2djvu -d 600 out

Will set resolution to 600 dpi (default is 300 dpi)

> img2djvu -t 17 out

Among color files, only files with color count < 17 will be sent to
cpaldjvu DjVu coder

> img2djvu -t 0 out

Will skip ImageMagick color count (this is much faster!) and send all
color files to c44 DjVu coder; cpaldjvu will not run

> img2djvu -m 10 out

For black and white pages, will use minidjvu DjVu coder (with 10 pages
per dictionary) instead of default cjb2 coder

> img2djvu -l 1 out

For color pages, img2djvu will perform layer separation*** (assuming that
the file is the result of Scan Tailor* mixed mode output**), then blur
color part and start forced segmentation****; it is slow but usually
produce compact output.

===

*    Scan Tailor: http://scantailor.sourceforge.net/

**   Mixed mode outputs image colors in [1...254] rank whereas text is
pure black [0] and page background is pure white [255]

***  Layer separation after Scan Tailor:
http://forum.ru-board.com/topic.cgi?forum=5&topic=32945 (in Russian),
http://alexrey036.narod.ru/LayerTailor/LayerTailor.zip

**** Forced segmentation: http://djvu-soft.narod.ru/scan/back_glue.htm
(in Russian)

> img2djvu -a 2 -m 20 out

Will produce more compact output; however, check the result
carefully---artifacts could appear!

> img2djvu -l 4 out

Will use four-fold downsampling for color layers; result will be
extremely compact. Coefficient should be an integer between 1 and 12;
best choices are 2, 3, or 4.

> img2djvu -p "" out

Will NOT use blur and contrast for processing color layers

> img2djvu -l 1 -r rus -e cuneiform -j 2 -a 1 out

After creation of the final DjVu, will run two OCR jobs of cuneiform with
"-rus" language option via ocrodjvu and insert text layer in place

> img2djvu -c 1 out

Will create temporary files in current directory

===

CAVEATS

Absolute paths are not allowed

It is expected that all images inside a folder have the same resolution

Folder and image names are very important; they, for example, determine
the page sequence in future DjVu file. It is strongly recommended to
rename files sequentially before im2djvu run. Also, please avoid using
anything except [0-9a-z_-] in the file and folder names. No spaces and
non-ASCII letters, please!

It is NOT recommended to use -l option with files which did not come from
Scan Tailor output