OZBC provides an efficent compressed bitmap to create bitmap indexes on high-cardinality columns.
This library is scheduled for deprecation in favor of ozbcbitmap in bitrush-index. All new features development is happening in bitrush-index.
Bitmap indexes have traditionally been considered to work well for low-cardinality columns, which have a modest number of distinct values. The simplest and most common method of bitmap indexing on attribute A with K cardinality associates a bitmap with every attribute value V then the Vth bitmap rapresent the predicate A=V. This approach ensures an efficient solution for performing search but on high-cardinality attributes the size of the bitmap index increase dramatically.
OZBC is a run-length-encoded hybrid compressed bitmap designed exclusively to create a bitmap indexes on L cardinality attributes where L>=16 and provide bitwise logical operations in running time complexity proportianl to the compressed bitmap size.
The fundamental property of a bitmap index composed from L bitmap is that for each row there is only one bitmap setted and then for N rows there are N bits=1 independently of the size of L. This implies that for L bitmaps and N rows there are in total (L-1)*N bits=0 and N bits=1. In this scenario to minimize the size of the index (composed from L bitmap) is very important compress bits=0 sequences in efficient mode.
Unlike WAH, EWAH, and others compressed bitmap which can compress also sequences of bits=1 and ensure that in the worse case the size of a compressed bitmap is the same of a uncompressed bitmap, OZBC can compress only sequences of bits=0 and in the worse case the size of a compressed bitmap is twice of a uncompressed bitmap.
For this reason OZBC isn't designed for general purposes but in "bitmap indexes on L-cardinality scenario" there are L bitmaps and if there is a bitmap with a high number of bits setted and then double-size compared to the corresponding uncompressed bitmap then there are others bitmaps with all bits unsetted and then with a minimal size.
The benchmark compares the size of Roaring, EWAH and OZBC bitmap index that indexes N=1000000 (1M) random and sorted values included in 0,L-1 range.
Each index is composed from L bitmaps.
The index size rapresent the effective cost to save L bitmaps on disk.
To compile and generate static library "lib/libOZBCBitmap" and test:
- make
To run index test:
- ./index
To run perf test:
- ./perf
To run serialize test:
- ./serialize
See Index_example.
OZBCBitmap encodes bits in 16bits words. There are two types of words:
|1bit word_type=0|7bit bytes_zero|8bit dirty_byte|
|1bit word_type=1| 15bit 128_bytes_zero |
where:
- bytes_zero = number of consecutive sequence of 8bit of zeros.
- dirty_byte = sequence of 8bit uncompressed.
- 128_bytes_zero = number of consecutive sequence of 1024bit of zeros.
The max size of this compressed bitmap is twice the size of the same uncompressed bitmap.
The max number of consecutive zero bits that can be rapresented from a single word is ((2^15) - 1) * (2^10) = (2^25 - 2^10) bits.
Gnu Lesser General Public Licensev3.0.
See OZBCBitmap.