Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Serialization of bit arrays (both fixed and variable-length) is slow #285

Open
pavel-kirienko opened this issue Mar 3, 2023 · 0 comments
Labels
under consideration Not ready to accept this issue but will consider.

Comments

@pavel-kirienko
Copy link
Member

Large bit arrays can be found even in the standard data type set, so this matter may potentially affect common applications.

In C, the memory storage format of bit arrays is the same as their wire representation, which enables serialization via memcpy. The application can manipulate the contents using nunavutCopyBits, nunavutSetBit, and nunavutGetBit.

In Python, bit arrays are stored using NumPy arrays and serialized using numpy.packbits.

In C++, the current implementation (de)serializes arrays bit-by-bit which is likely to cause performance issues. Sadly the C++ implementation cannot enforce a wire-compatible memory storage format because it has to be compatible with standard containers like std::vector<bool> and std::bitset. Are there any ideas on how to improve the bit array serialization without requiring the use of custom bit containers where the memory storage format is known?

In the specialization of VariableLengthArray<bool> that I implemented in #284 the memory storage format is the same as the wire format but currently, the serialization methods cannot benefit from that.

@pavel-kirienko pavel-kirienko added the under consideration Not ready to accept this issue but will consider. label Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
under consideration Not ready to accept this issue but will consider.
Projects
None yet
Development

No branches or pull requests

1 participant