Please be sure to see HDF5 Filter Plugins, a convenience software that packages together many of the commonly used filters that users have created and registered.
Members of the HDF5 user community can create and register Third-Party (compression or other) filters for use with HDF5.
To register a filter please contact The HDF Helpdesk with the following information:
- Contact information for the developer requesting a new identifier
- Short description of the new filter
- Links to any relevant information including licensing information
Here is the current policy regarding filter identifier assignment:
The filter identifier is designed to be a unique identifier for the filter. Values from zero
through 32,767
are reserved for filters supported by The HDF Group in the HDF5 library and for filters requested and supported by the 3rd party.
Values from 32768
to 65535
are reserved for non-distributed uses (e.g., internal company usage) or for application usage when testing a feature. The HDF Group does not track or document the usage of filters with identifiers from this range.
Please contact the maintainer of a filter for help with the filter/compression support in HDF5.
Filter | Identifier Name | Short Description |
---|---|---|
305 | LZO | LZO lossless compression used by PyTables |
307 | BZIP2 | BZIP2 lossless compression used by PyTables |
32000 | LZF | LZF lossless compression used by H5Py project |
32001 | BLOSC | Blosc lossless compression used by PyTables |
32002 | MAFISC | Modified LZMA compression filter, MAFISC (Multidimensional Adaptive Filtering Improved Scientific data Compression) |
32003 | Snappy | Snappy lossless compression |
32004 | LZ4 | LZ4 fast lossless compression algorithm |
32005 | APAX | Samplify’s APAX Numerical Encoding Technology |
32006 | CBF | All imgCIF/CBF compressions and decompressions, including Canonical, Packed, Packed Version 2, Byte Offset and Nibble Offset |
32007 | JPEG-XR | Enables images to be compressed/decompressed with JPEG-XR compression |
32008 | bitshuffle | Extreme version of shuffle filter that shuffles data at bit level instead of byte level |
32009 | SPDP | SPDP fast lossless compression algorithm for single- and double-precision floating-point data |
32010 | LPC-Rice | LPC-Rice multi-threaded lossless compression |
32011 | CCSDS-123 | ESA CCSDS-123 multi-threaded compression filter |
32012 | JPEG-LS | CharLS JPEG-LS multi-threaded compression filter |
32013 | zfp | Lossy & lossless compression of floating point and integer datasets to meet rate, accuracy, and/or precision targets. |
32014 | fpzip | Fast and Efficient Lossy or Lossless Compressor for Floating-Point Data |
32015 | Zstandard | Real-time compression algorithm with wide range of compression / speed trade-off and fast decoder |
32016 | B³D | GPU based image compression method developed for light-microscopy applications |
32017 | SZ | An error-bounded lossy compressor for scientific floating-point data |
32018 | FCIDECOMP | EUMETSAT CharLS compression filter for use with netCDF |
32019 | JPEG | Jpeg compression filter |
32020 | VBZ | Compression filter for raw dna signal data used by Oxford Nanopore |
32021 | FAPEC | Versatile and efficient data compressor supporting many kinds of data and using an outlier-resilient entropy coder |
32022 | BitGroom | The BitGroom quantization algorithm |
32023 | Granular | BitRound (GBR) The GBG quantization algorithm is a significant improvement to the BitGroom filter |
32024 | SZ3 | A modular error-bounded lossy compression framework for scientific datasets |
32025 | Delta-Rice | Lossless compression algorithm optimized for digitized analog signals based on delta encoding and rice coding |
32026 | BLOSC2 | The recent new-generation version of the Blosc compression library |
32027 | FLAC | FLAC audio compression filter in HDF5 |
32028 | H5Z-SPERR | H5Z-SPERR is the HDF5 filter for SPERR |
32029 | TERSE/PROLIX | A lossless and fast compression of the diffraction data |
32030 | FFMPEG | A lossy compression filter based on ffmpeg video library |
LZO is a portable lossless data compression library written in ANSI C. Reliable and thoroughly tested. High adoption - each second terrabytes of data are compressed by LZO. No bugs since the first release back in 1996. Offers pretty fast compression and extremely fast decompression. Includes slower compression levels achieving a quite competitive compression ratio while still decompressing at this very high speed. Distributed under the terms of the GNU General Public License (GPL v2+). Commercial licenses are available on request. Military-grade stability and robustness.
http://www.oberhumer.com/opensource/lzo/ http://www.pytables.org
Francesc Alted Email: faltet at pytables dot org
bzip2 is a freely available, patent free, high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.
http://www.bzip.org http://www.pytables.org
Francesc Alted Email: faltet at pytables dot org
The LZF filter is an alternative DEFLATE-style compressor for HDF5 datasets, using the free LZF library by Marc Alexander Lehmann. Its main benefit over the built-in HDF5 DEFLATE filter is speed; in memory-to-memory operation as part of the filter pipeline, it typically compresses 3x-5x faster than DEFLATE, and decompresses 2x faster, while maintaining 50% to 90% of the DEFLATE compression ratio.
LZF can be used to compress any data type, and requires no compile-time or run-time configuration. HDF5 versions 1.6.5 through 1.8.3 are supported. The filter is written in C and can be included directly in C or C++ applications; it has no external dependencies. The license is 3-clause BSD (virtually unrestricted, including commercial applications).
More information, downloads, and benchmarks, are available at the http://h5py.org/lzf/.
Additional Information:
The LZF filter was developed as part of the h5py project, which implements a general-purpose interface to HDF5 from Python.
The h5py homepage: http://h5py.org
The LZF library homepage: http://home.schmorp.de/marc/liblzf.html
Andrew Collette Web: http://h5py.org
Blosc is a high performance compressor optimized for binary data. It has been designed to compress data very fast, at the expense of achieving lesser compression ratios than, say, zlib+shuffle. It is mainly meant to not introduce a significant delay when dealing with data that is stored in high-performance I/O systems (like large RAID cabinets, or even the OS filesystem memory cache).
It uses advanced cache-efficient techniques to reduce activity on the memory bus as much as possible. It also leverages SIMD (SSE2) and multi-threading capabilities present in nowadays multi-core processors so as to accelerate the compression/decompression process to a maximum.
http://blosc.org/ http://www.pytables.org
Francesc Alted Email: faltet at pytables dot org
This compressing filter exploits the multidimensionality and smoothness characterizing many scientific data sets. It adaptively applies some filters to preprocess the data and uses lzma as the actual compression step. It significantly outperforms pure lzma compression on most datasets.
The software is currently under a rather unrestrictive two clause BSD style license.
http://wr.informatik.uni-hamburg.de/research/projects/icomex/mafisc
Nathanael Huebbe Email: nathanael.huebbe at informatik dot uni-hamburg dot de
Snappy-CUDA is a compression/decompression library that leverages GPU processing power to compress/decompress data. The Snappy compression algorithm does not aim for maximum compression or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, the reference implementation of Snappy on the CPU is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger.
https://github.com/lucasvr/snappy-cuda https://github.com/google/snappy
Lucas C. Villa Real Email: lucasvr at gmail dot com
LZ4 is a very fast lossless compression algorithm, providing compression speed at 300 MB/s per core, scalable with multi-cores CPU. It also features an extremely fast decoder, with speeds up and beyond 1GB/s per core, typically reaching RAM speed limits on multi-core systems. For a format description of the LZ4 compression filter in HDF5, see HDF5_LZ4.pdf.
LZ4 Algorithm: https://github.com/nexusformat/HDF5-External-Filter-Plugins/tree/master/LZ4
LZ4 Code:
Although the LZ4 software is not supported by The HDF Group, it is included in The HDF Group SVN repository so that it can be tested regularly with HDF5. For convenience, users can obtain it from SVN with the following command: svn checkout https://svn.hdfgroup.org/hdf5_plugins/trunk/LZ4 LZ4
Michael Rissi (Dectris Ltd.) Email: michael dot rissi at dectris dot com
Appears to be no longer available
All imgCIF/CBF compressions and decompressions, including Canonical, Packed, Packed Vesrsion 2, Byte Offset and Nibble Offset. License Information: GPL and LGPL
Herbert J. Bernstein Email: yayahjb at gmail dot com
Filter that allows HDF5 image datasets to be compressed or decompressed using the JPEG-XR compression method.
JPEG-XR Compression Method JPEG-XR Filter for HDF5
Marvin Albert Email: marvin dot albert at gmail dot com
This filter shuffles data at the bit level to improve compression. CHIME uses this filter for data acquisition.
bitshuffle CHIME
Kiyoshi Masui Email: kiyo at physics dot ubc dot ca
SPDP is a fast, lossless, unified compression/decompression algorithm designed for both 32-bit single-precision (float) and 64-bit double-precision (double) floating-point data. It also works on other data.
http://cs.txstate.edu/~burtscher/research/SPDP/
Martin Burtscher Email: burtscher at txstate dot edu
LPC-Rice is a fast lossless compression codec that employs Linear Predictive Coding together with Rice coding. It supports multi-threading and SSE2 vector instructions, enabling it to exceed compression and decompression speeds of 1 GB/s.
https://sourceforge.net/projects/lpcrice/
Frans van den Bergh Email: fvdbergh at csir dot co dot za
Derick Swanepoel Email: dswanepoel at gmail dot com
CCSDS-123 is a multi-threaded HDF5 compression filter using the ESA CCSDS-123 implementation.
https://sourceforge.net/projects/ccsds123-hdf-filter/
Frans van den Bergh Email: fvdbergh at csir dot co dot za
Derick Swanepoel Email: dswanepoel at gmail dot com
JPEG-LS is a multi-threaded HDF5 compression filter using the CharLS JPEG-LS implementation.
https://sourceforge.net/projects/jpegls-hdf-filter/
Frans van den Bergh Email: fvdbergh at csir dot co dot za
Derick Swanepoel Email: dswanepoel at gmail dot com
zfp is a BSD licensed open source C++ library for compressed floating-point arrays that support very high throughput read and write random access. zfp was designed to achieve high compression ratios and therefore uses lossy but optionally error-bounded compression. Although bit-for-bit lossless compression is not always possible, zfp is usually accurate to within machine epsilon in near-lossless mode, and is often orders of magnitude more accurate and faster than other lossy compressors.
https://github.com/LLNL/H5Z-ZFP
For more information see: http://computation.llnl.gov/projects/floating-point-compression/
Mark Miller Email: miller86 at llnl dot gov
Peter Lindstrom Email: pl at llnl dot gov
fpzip is a library for lossless or lossy compression of 2D or 3D floating-point scalar fields. Although written in C++, fpzip has a C interface. fpzip was developed by Peter Lindstrom at LLNL.
For more information see: http://computation.llnl.gov/projects/floating-point-compression/
Peter Lindstrom Email: pl at llnl dot gov
Zstandard is a real-time compression algorithm, providing high compression ratios. It offers a very wide range of compression / speed trade-offs, while being backed by a very fast decoder. The Zstandard library is provided as open source software using a BSD license.
https://github.com/aparamon/HDF5Plugin-Zstandard
Andrey Paramonov Email: paramon at acdlabs dot ru
B³D is a fast (~1 GB/s), GPU based image compression method, developed for light-microscopy applications. Alongside lossless compression, it offers a noise dependent lossy compression mode, where the loss can be tuned as a proportion of the inherent image noise (accounting for photon shot noise and camera read noise). It not only allows for fast compression during image, but can achieve compression ratios up 100.
SZ is a fast and efficient error-bounded lossy compressor for floating-point data. It was developed for scientific applications producing large-scale HPC data sets. SZ supports C, Fortran, and Java and has been tested on Linux and Mac OS X.
Sheng Di Email: sdi1 at anl dot gov
Franck Cappello Email: cappello at mcs dot anl dot gov
FCIDECOMP is a third-party compression filter used at EUMETSAT for the compression of netCDF-4 files. It is a codec implementing JPEG-LS using CharLS used for satellite imagery.
All software and documentation can be found at this link:
ftp://ftp.eumetsat.int/pub/OPS/out/test-data/Test-data-for-External-Users/MTG_FCI_L1c_Compressed-Datasets_and_Decompression-Plugin_April2017/Decompression_Plugin/
Dr. Daniel Lee Email: daniel dot lee at eumetsat dot int
This is a lossy compression filter. It provides a user-specified "quality factor" to control the trade-off of size versus accuracy.
Information Github License
libjpeg: This library is available as a package for most Linux distributions, and source code is available from https://www.ijg.org/.
Restrictions:
Only 8-bit unsigned data arrays are supported. Arrays must be either: 2-D monochromatic [NumColumns, NumRows] 3-D RGB [3, NumColumns, NumRows] Chunking must be set to the size of one entire image so the filter is called once for each image. Using the JPEG filter in your application:
HDF5 only supports compression for "chunked" datasets; this just means that you need to call H5Pset_chunk to specify a chunk size. The chunking must be set to the size of a single image for the JPEG filter to work properly.
When calling H5Pset_filter for compression it must be called with cd_nelmts=4 and cd_values as follows:
cd_values[0] = quality factor (1-100)
cd_values[1] = numColumns
cd_values[2] = numRows
cd_values[3] = 0=Mono, 1=RGB
Common h5repack parameter: UD=32019,0,4,q,c,r,t
Mark Rivers , University of Chicago (rivers at cars.uchicago.edu)
This filter is used by Oxford Nanopore specifically to compress raw dna signal data (signed integer). To achieve this it uses both:
streamvbyte (https://github.com/lemire/streamvbyte)
zstd (https://github.com/facebook/zstd)
George Pimm
FAPEC is a versatile and efficient data compressor, initially designed for satellite payloads but later extended for ground applications. It relies on an outlier-resilient entropy coding core with similar ratios and speeds than CCSDS 121.0 (adaptive Rice).
FAPEC has a large variety of pre-processing stages and options: images (greyscale, colour, hyperspectral); time series or waveforms (including interleaving, e.g. for multidimensional or interleaved time series or tabular data); floating point (single+double precision); text (including LZW compression and our faster FAPECLZ); tabulated text (CSV); genomics (FastQ); geophysics (Kongsberg's water column datagrams); etc.
Most stages support samples of 8 to 24 bits (big/little endian, signed/unsigned), and lossless/lossy options. It can be extended with new, tailored pre-processing stages. It includes encryption options (AES-256 based on OpenSSL, and our own XXTEA implementation).
The FAPEC library and CLI runs on Linux, Windows and Mac. The HDF5 user must request and install the library separately, thus allowing to upgrade it without requiring changes in your HDF5 code.
https://www.dapcom.es/fapec/ https://www.dapcom.es/get-fapec/ https://www.dapcom.es/resources/FAPEC_EndUserLicenseAgreement.pdf
Jordi Portell i de Mora (DAPCOM Data Services S.L.)
fapec at dapcom dot es
The BitGroom quantization algorithm is documented in:
Zender, C. S. (2016), Bit Grooming: Statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+), Geosci. Model Dev., 9, 3199-3211, doi:10.5194/gmd-9-3199-2016.
The filter is documented and maintained in the Community Codec Repository (https://github.com/ccr/ccr).
Charlie Zender (University of California, Irvine)
The GBG quantization algorithm is a significant improvement the BitGroom filter documented in:
Zender, C. S. (2016), Bit Grooming: Statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+), Geosci. Model Dev., 9, 3199-3211, doi:10.5194/gmd-9-3199-2016.
This filter is documented, implemented, and maintained in the Community Codec Repository (https://github.com/ccr/ccr).
Charlie Zender (University of California, Irvine)
SZ3 is a modular error-bounded lossy compression framework for scientific datasets, which allows users to customize their own compression pipeline to adapt to diverse datasets and user-requirements. Compared with SZ2 (filter id: 32017), SZ3 has integrated a more effective prediction such that its compression qualities/ratios are much higher than that of SZ2 in most of cases.
This filter is documented, implemented, and maintained in github: https://github.com/szcompressor/SZ3.
License: https://github.com/szcompressor/SZ/blob/master/copyright-and-BSD-license.txt
Sheng Di Email: sdi1 at anl dot gov
Franck Cappello Email: cappello at mcs dot anl dot gov
Lossless compression algorithm optimized for digitized analog signals based on delta encoding and rice coding.
This filter is documented, implemented, and maintained at: https://gitlab.com/dgma224/deltarice.
David Mathews Email: david dot mathews dot 1994 at gmail dot com
Blosc is a high performance compressor optimized for binary data (i.e. floating point numbers, integers and booleans). It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc main goal is not just to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations.
C-Blosc2 is the new major version of C-Blosc, and tries hard to be backward compatible with both the C-Blosc1 API and its in-memory format.
Blosc project: https://www.blosc.org
C-Blosc2 docs: https://www.blosc.org/c-blosc2/c-blosc2.html
License: https://github.com/Blosc/c-blosc2/blob/main/LICENSE.txt
Francesc Alted Email: faltet at gmail dot org (BDFL for the Blosc project)
FLAC is an audio compression filter in HDF5. (Our ultimate goal is to use it via h5py in the hdf5plugin library: https://github.com/silx-kit/hdf5plugin).
The FLAC filter is open source: https://github.com/xiph/flac
libFLAC has BSD-like license: https://github.com/xiph/flac/blob/master/CONTRIBUTING.md
Laurie Stephey Email: lastephey at lbl dot gov
SPERR is a wavelet-based lossy compressor for floating-point scientific data; it achieves one of the best compression ratios given a user-prescribed error tolerance (i.e., maximum point-wise error). SPERR also supports two distinctive decoding modes, namely "flexible-rate decoding" and "multi-resolution decoding," that facilitate data analysis with various constraints. More details are available on SPERR Github repository: https://github.com/NCAR/SPERR.
H5Z-SPERR is the HDF5 filter for SPERR. It is also available on Github: https://github.com/NCAR/H5Z-SPERR
Samuel Li Email: shaomeng at ucar dot edu
A new compression algorithm (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626653/), which is specifically tailored for the lossless and fast compression of the diffraction data.
GitHub repo of the algorithm: https://github.com/Senikm/trpx
Jan Pieter Abrahams Email: jp.abrahams at unibas dot ch
Senik Matinyan Email: senik.matinyan at unibas dot ch
A lossy compression filter based on ffmpeg video library.
https://github.com/Cai-Lab-at-University-of-Michigan/ffmpeg_HDF5_filter
License: Under MIT License
Cai Lab at University of Michigan: https://www.cai-lab.org