Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance performance of rz_bv_copy_nbits and rz_bv_set_range #4716

Open
Rot127 opened this issue Nov 13, 2024 · 2 comments
Open

Enhance performance of rz_bv_copy_nbits and rz_bv_set_range #4716

Rot127 opened this issue Nov 13, 2024 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers performance A performance problem/enhancement RzUtil

Comments

@Rot127
Copy link
Member

Rot127 commented Nov 13, 2024

Is your feature request related to a problem? Please describe.

Performance could be better and we save energy which would make us better humans.

Describe the solution you'd like

Both functions do something like this:

for (ut32 i = 0; i < nbit; ++i) {
   bool c = rz_bv_get(src, src_start_pos + i); // Not in case of rz_bv_set_range()
   rz_bv_set(dst, dst_start_pos + i, c);
}

At least if we deal with small bit vectors (<= 64bit) we could do this with shift, AND + OR.
This would save a bunch of calls and other shifts and ORs.

For larger bit vectors we could check if the range is large enough (e.g. 8 bits or more).
Then set the first bits until the rest of them is aligned to a multiple of a byte (for RzBitVector->large_a) and assign the rest of them via memcopy.

Describe alternatives you've considered

None really.

Additional context

This would give us quite some computation back. In my use case this loop consumes around 2.3%.
I also think this is a required improvement before we merge resource intensive analysis algorithms based on it.

@Rot127 Rot127 added enhancement New feature or request good first issue Good for newcomers RzUtil performance A performance problem/enhancement labels Nov 13, 2024
@Rot127
Copy link
Member Author

Rot127 commented Nov 23, 2024

Possible implementation: https://graphics.stanford.edu/~seander/bithacks.html#MaskedMerge.
Simple for ut64 values, would need to be generalized to everything >64bits.

@rajRishi22
Copy link

rajRishi22 commented Nov 25, 2024

Hello,
I would like to work on this issue and believe that a change in rz_bv_copy_nbits might improve performance. Could you please provide information on the architecture that this function is intended to run on?
Thank you.

RZ_API ut32 rz_bv_copy_nbits(RZ_NONNULL const RzBitVector *src, ut32 src_start_pos, RZ_NONNULL RzBitVector *dst, ut32 dst_start_pos, ut32 nbit) {
    rz_return_val_if_fail(src && dst, 0);

    // Determine the chunk size (word size) dynamically
    const ut32 chunk_size = sizeof(unsigned long) * CHAR_BIT; // Word size in bits
    ut32 max_nbit = RZ_MIN((src->len - src_start_pos), (dst->len - dst_start_pos));

    if (max_nbit < nbit) {
        return 0;
    }

    ut32 bits_copied = 0;

    // Handle unaligned prefix
    while ((src_start_pos % chunk_size != 0 || dst_start_pos % chunk_size != 0) && nbit > 0) {
        bool bit = rz_bv_get(src, src_start_pos++);
        rz_bv_set(dst, dst_start_pos++, bit);
        --nbit;
        ++bits_copied;
    }

    // Process aligned chunks
    while (nbit >= chunk_size) {
        // Get chunks from the source and destination
        unsigned long src_chunk = rz_bv_get_chunk(src, src_start_pos / chunk_size);
        unsigned long dst_chunk = rz_bv_get_chunk(dst, dst_start_pos / chunk_size);

        // Create a mask for the bits to copy
        unsigned long mask = (1UL << chunk_size) - 1;
        if (nbit < chunk_size) {
            mask = (1UL << nbit) - 1;
        }

        // Merge chunks using the optimized approach
        unsigned long result = dst_chunk ^ ((dst_chunk ^ src_chunk) & mask);
        rz_bv_set_chunk(dst, dst_start_pos / chunk_size, result);

        src_start_pos += chunk_size;
        dst_start_pos += chunk_size;
        nbit -= chunk_size;
        bits_copied += chunk_size;
    }

    // Handle remaining unaligned suffix bits
    while (nbit > 0) {
        bool bit = rz_bv_get(src, src_start_pos++);
        rz_bv_set(dst, dst_start_pos++, bit);
        --nbit;
        ++bits_copied;
    }

    return bits_copied;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers performance A performance problem/enhancement RzUtil
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants