Enhance performance of `rz_bv_copy_nbits` and `rz_bv_set_range` #4716

Rot127 · 2024-11-13T18:49:52Z

Is your feature request related to a problem? Please describe.

Performance could be better and we save energy which would make us better humans.

Describe the solution you'd like

Both functions do something like this:

for (ut32 i = 0; i < nbit; ++i) {
   bool c = rz_bv_get(src, src_start_pos + i); // Not in case of rz_bv_set_range()
   rz_bv_set(dst, dst_start_pos + i, c);
}

At least if we deal with small bit vectors (<= 64bit) we could do this with shift, AND + OR.
This would save a bunch of calls and other shifts and ORs.

For larger bit vectors we could check if the range is large enough (e.g. 8 bits or more).
Then set the first bits until the rest of them is aligned to a multiple of a byte (for RzBitVector->large_a) and assign the rest of them via memcopy.

Describe alternatives you've considered

None really.

Additional context

This would give us quite some computation back. In my use case this loop consumes around 2.3%.
I also think this is a required improvement before we merge resource intensive analysis algorithms based on it.

The text was updated successfully, but these errors were encountered:

Rot127 · 2024-11-23T14:56:15Z

Possible implementation: https://graphics.stanford.edu/~seander/bithacks.html#MaskedMerge.
Simple for ut64 values, would need to be generalized to everything >64bits.

rajRishi22 · 2024-11-25T07:30:26Z

Hello,
I would like to work on this issue and believe that a change in rz_bv_copy_nbits might improve performance. Could you please provide information on the architecture that this function is intended to run on?
Thank you.

RZ_API ut32 rz_bv_copy_nbits(RZ_NONNULL const RzBitVector *src, ut32 src_start_pos, RZ_NONNULL RzBitVector *dst, ut32 dst_start_pos, ut32 nbit) {
    rz_return_val_if_fail(src && dst, 0);

    // Determine the chunk size (word size) dynamically
    const ut32 chunk_size = sizeof(unsigned long) * CHAR_BIT; // Word size in bits
    ut32 max_nbit = RZ_MIN((src->len - src_start_pos), (dst->len - dst_start_pos));

    if (max_nbit < nbit) {
        return 0;
    }

    ut32 bits_copied = 0;

    // Handle unaligned prefix
    while ((src_start_pos % chunk_size != 0 || dst_start_pos % chunk_size != 0) && nbit > 0) {
        bool bit = rz_bv_get(src, src_start_pos++);
        rz_bv_set(dst, dst_start_pos++, bit);
        --nbit;
        ++bits_copied;
    }

    // Process aligned chunks
    while (nbit >= chunk_size) {
        // Get chunks from the source and destination
        unsigned long src_chunk = rz_bv_get_chunk(src, src_start_pos / chunk_size);
        unsigned long dst_chunk = rz_bv_get_chunk(dst, dst_start_pos / chunk_size);

        // Create a mask for the bits to copy
        unsigned long mask = (1UL << chunk_size) - 1;
        if (nbit < chunk_size) {
            mask = (1UL << nbit) - 1;
        }

        // Merge chunks using the optimized approach
        unsigned long result = dst_chunk ^ ((dst_chunk ^ src_chunk) & mask);
        rz_bv_set_chunk(dst, dst_start_pos / chunk_size, result);

        src_start_pos += chunk_size;
        dst_start_pos += chunk_size;
        nbit -= chunk_size;
        bits_copied += chunk_size;
    }

    // Handle remaining unaligned suffix bits
    while (nbit > 0) {
        bool bit = rz_bv_get(src, src_start_pos++);
        rz_bv_set(dst, dst_start_pos++, bit);
        --nbit;
        ++bits_copied;
    }

    return bits_copied;
}

Rot127 added enhancement New feature or request good first issue Good for newcomers RzUtil performance A performance problem/enhancement labels Nov 13, 2024

This was referenced Nov 25, 2024

Rz bv copy #4739

Closed

Improved performance of rz_bv_copy_nbits and rz_bv_set_range #4740

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance performance of `rz_bv_copy_nbits` and `rz_bv_set_range` #4716

Enhance performance of `rz_bv_copy_nbits` and `rz_bv_set_range` #4716

Rot127 commented Nov 13, 2024

Rot127 commented Nov 23, 2024

rajRishi22 commented Nov 25, 2024 •

edited by wargio

Loading

Enhance performance of rz_bv_copy_nbits and rz_bv_set_range #4716

Enhance performance of rz_bv_copy_nbits and rz_bv_set_range #4716

Comments

Rot127 commented Nov 13, 2024

Rot127 commented Nov 23, 2024

rajRishi22 commented Nov 25, 2024 • edited by wargio Loading

Enhance performance of `rz_bv_copy_nbits` and `rz_bv_set_range` #4716

Enhance performance of `rz_bv_copy_nbits` and `rz_bv_set_range` #4716

rajRishi22 commented Nov 25, 2024 •

edited by wargio

Loading