Skip to content
flanglet edited this page Oct 31, 2024 · 10 revisions

How do I select specific transforms/entropy at runtime instead of using compression levels?

You provide the transform using the -t (or --transform=) command line option followed by the transform(s). Provide -e (--entropy=)

Example: -t TEXT or -t RLT+TEXT+UTF+LZ

Provide -e (--entropy=) on the command line followed by the codec of your choice:

Example: -e ANS1

How many threads are going to be used during (de)compression ?

By default, kanzi detects the number of cores in the CPU and uses half of the cores. The maximum number of parallel jobs allowed is hard coded to 64.

Providing -j 1 on the command line makes (de)compression use one core.

Providing -j 0 on the command line makes (de)compression use all available cores.

Can kanzi (de)compress full folders ?

Yes, if the input source provided on the command line is a directory, all files under that folder are going be recursively processed.

The files will be processed in parallel if more than one core is available.

To avoid recursion and process only the top level folder, use a dot syntax:

EG. -i ~/myfolder/. on Linux
EG. -i c:\users\programs\. on Windows

When processing a folder, can kanzi avoid processing link files or dot files ?

Yes, to avoid processing link files, add this option to the command line --no-link

To avoid processing dot files, add this option to the command line --no-dot-file

Does kanzi support pipes and input/output redirection ?

Yes, one way to do it is to use STDIN/STDOUT as input/output on the command line:

 gunzip /tmp/kanzi.1.gz  | java -jar kanzi.jar -c -i stdin -l 2 -o /tmp/kanzi.1.knz

 java -jar kanzi.jar -d -i /tmp/silesia.tar.knz -o stdout | tar -xf -

Or, using redirections,

java -jar kanzi.jar -c -f -l 2 < /tmp/enwik8 > /tmp/enwik8.knz

If -i is absent from the command line, the data is assumed to come from STDIN and go to STDOUT. Another example (processing a 0 length pseudo-file !):

cat /proc/stat  | java -jar kanzi.jar -c -i stdin -l 0 -o /tmp/stat.knz 
java -jar kanzi.jar -d -i /tmp/stat.knz -o stdout

Notice that, during compression, kanzi stores the size of the input file (when it is available) so that the decompressor can verify the output size after decompression. The original size is also used by the decompressor to optimize internal resources. Thus, providing -i and -o is recommended over redirection.

Does kanzi produce a seekable stream ?

Yes, it is possible to decompress only one or a sequence of consecutive blocks by using the --from and --to options during decompression.

java -jar kanzi.jar -d -i /tmp/book1.knz -v 4 -f

Block 1: 34451 => 36530 [0 ms] => 65536 [0 ms]
Block 2: 33295 => 35330 [0 ms] => 65536 [0 ms]
Block 3: 33702 => 35807 [0 ms] => 65536 [0 ms]
Block 4: 33555 => 35502 [0 ms] => 65536 [0 ms]
Block 5: 34057 => 36065 [0 ms] => 65536 [0 ms]
Block 6: 33556 => 35622 [0 ms] => 65536 [0 ms]
Block 7: 33357 => 35167 [0 ms] => 65536 [0 ms]
Block 8: 33460 => 35446 [0 ms] => 65536 [0 ms]
Block 9: 33428 => 35431 [0 ms] => 65536 [0 ms]
Block 10: 33177 => 35180 [0 ms] => 65536 [0 ms]
Block 11: 33218 => 35156 [0 ms] => 65536 [0 ms]
Block 12: 24871 => 26246 [0 ms] => 47875 [0 ms]

Decompressing:     1 ms
Input size:        394176
Output size:       768771
Throughput (KB/s): 750752


java -jar kanzi.jar -d -i /tmp/book1.knz -v 4 -f  --from=4 --to=10

Block 4: 33555 => 35502 [0 ms] => 65536 [0 ms]
Block 5: 34057 => 36065 [0 ms] => 65536 [0 ms]
Block 6: 33556 => 35622 [0 ms] => 65536 [0 ms]
Block 7: 33357 => 35167 [0 ms] => 65536 [0 ms]
Block 8: 33460 => 35446 [0 ms] => 65536 [0 ms]
Block 9: 33428 => 35431 [0 ms] => 65536 [0 ms]

Decompressing:     1 ms
Input size:        394176
Output size:       393216
Throughput (KB/s): 384000

Can I find information about a compressed file without decompressing?

Yes, just use a combination of options (verbosity, from and to):

java -jar kanzi.jar -d -i /tmp/silesia.tar.knz -f -v 3 --from=1 --to=1

1 file to decompress

Verbosity: 3
Overwrite: true
Using 4 jobs
Input file name: '/tmp/silesia.tar.knz'
Output file name: '/tmp/silesia.tar.knz.bak'

Decompressing /tmp/silesia.tar.knz ...
Bitstream version: 5
Checksum: false
Block size: 4194304 bytes
Using HUFFMAN entropy codec (stage 1)
Using PACK+LZ transform (stage 2)
Original size: 211957760 bytes


Decompressing:     17 ms
Input size:        68350949
Output size:       0
Throughput (KB/s): 0

How does kanzi ensure data integrity ?

  • The bitstream header is CRC checked during decompression.
  • All transforms sanitize parameters coming from the bitstream during decompression.
  • The decompressor checks the size of the output file against the original size stored in the bitstream (when available).
  • A 32 bit CRC is stored for each block when the -x/--checksum command line option is provided.

There is no hash for the whole original file in the bitstream. However, adding a hash is possible with the following trick:

# compress and append MD5 of original file
java -jar kanzi.jar -c -i log -l 2
md5sum log | cut -d " " -f 1 >> log.knz

# decompress
java -jar kanzi.jar -d -i log.knz

# check MD5 of decompressed file vs stored one
tail -c 33 log.knz
be8ddef3d35483622f2fab9a4f812040

md5sum log.knz.bak | cut -d " " -f 1
be8ddef3d35483622f2fab9a4f812040

Will kanzi bitstream be backward compatible in future releases ?

Yes, the bitstream version is part of the bitstream header and is used during decompression to ensure that old versions can be decompressed.