Skip to content

uccidibuti/OZBCBitmap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OZBCBitmap

OZBC provides an efficent compressed bitmap to create bitmap indexes on high-cardinality columns.

Deprecation notice

This library is scheduled for deprecation in favor of ozbcbitmap in bitrush-index. All new features development is happening in bitrush-index.

Introduction

Bitmap indexes have traditionally been considered to work well for low-cardinality columns, which have a modest number of distinct values. The simplest and most common method of bitmap indexing on attribute A with K cardinality associates a bitmap with every attribute value V then the Vth bitmap rapresent the predicate A=V. This approach ensures an efficient solution for performing search but on high-cardinality attributes the size of the bitmap index increase dramatically.

OZBC is a run-length-encoded hybrid compressed bitmap designed exclusively to create a bitmap indexes on L cardinality attributes where L>=16 and provide bitwise logical operations in running time complexity proportianl to the compressed bitmap size.

Why OZBC?

The fundamental property of a bitmap index composed from L bitmap is that for each row there is only one bitmap setted and then for N rows there are N bits=1 independently of the size of L. This implies that for L bitmaps and N rows there are in total (L-1)*N bits=0 and N bits=1. In this scenario to minimize the size of the index (composed from L bitmap) is very important compress bits=0 sequences in efficient mode.

Unlike WAH, EWAH, and others compressed bitmap which can compress also sequences of bits=1 and ensure that in the worse case the size of a compressed bitmap is the same of a uncompressed bitmap, OZBC can compress only sequences of bits=0 and in the worse case the size of a compressed bitmap is twice of a uncompressed bitmap.

For this reason OZBC isn't designed for general purposes but in "bitmap indexes on L-cardinality scenario" there are L bitmaps and if there is a bitmap with a high number of bits setted and then double-size compared to the corresponding uncompressed bitmap then there are others bitmaps with all bits unsetted and then with a minimal size.

OZBC compared with EWAH and uncompressed Bitmap

The benchmark compares the size of Roaring, EWAH and OZBC bitmap index that indexes N=1000000 (1M) random and sorted values included in 0,L-1 range.

Each index is composed from L bitmaps.

The index size rapresent the effective cost to save L bitmaps on disk.

Alt text

Alt text

How to compile

To compile and generate static library "lib/libOZBCBitmap" and test:

  • make

To run index test:

  • ./index

To run perf test:

  • ./perf

To run serialize test:

  • ./serialize

Example

See Index_example.

Encoding

OZBCBitmap encodes bits in 16bits words. There are two types of words:

|1bit word_type=0|7bit    bytes_zero|8bit     dirty_byte|
|1bit word_type=1|         15bit 128_bytes_zero         |

where:

  • bytes_zero = number of consecutive sequence of 8bit of zeros.
  • dirty_byte = sequence of 8bit uncompressed.
  • 128_bytes_zero = number of consecutive sequence of 1024bit of zeros.

The max size of this compressed bitmap is twice the size of the same uncompressed bitmap.

The max number of consecutive zero bits that can be rapresented from a single word is ((2^15) - 1) * (2^10) = (2^25 - 2^10) bits.

Licensing

Gnu Lesser General Public Licensev3.0.

Structure and API of OZBCBitmap

See OZBCBitmap.

About

Compressed Bitmap in C++ for bitmap Indexes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published