Sign-extending signed integer types? #236

korrat · 2021-09-30T15:04:31Z

Currently, deku deserializes signed integers with limited bit width like unsigned integers, as evidenced by the following (passing) test:

use deku::{bitvec::BitSlice, ctx::Size, DekuRead as _};

#[test]
fn sign_extending() {
    let bits = &BitSlice::from_slice(&[0x1Fu8]).unwrap()[3..];
    let (_rest, num) = i16::read(bits, Size::Bits(5)).unwrap();

    assert_eq!(num, 31);
}

I would expect num to be -1 instead, similar to how bit fields in C behave (Compiler Explorer example).

Is this something that could be supported? Perhaps using another context argument?

The text was updated successfully, but these errors were encountered:

sharksforarms · 2021-10-05T02:44:10Z

Essentially, we're reading from a byte. Do you have an example on how to get the result you're looking for in rust/bitvec? It may be possible!

https://github.com/sharksforarms/deku/blob/master/src/impls/primitive.rs#L88

korrat · 2021-10-07T05:30:22Z

You can use Rust's arithmetic right shifts for sign-extension:

fn sign_extend(num: i16, bits: u32) -> i16 {
    let shift = i16::BITS - bits;
    (num << shift) >> shift
}

Playground

This can be combined with the map attribute to achieve the desired behavior, but I think it would be more ergonomic, if deku supported the use case directly.

I can also open a PR for this. Just let me know if this is something you want to support and whether this should be the default for signed integers.

myrrlyn · 2021-10-11T16:34:47Z

This is now the second time I have been asked for signed-integer support in BitField, and I think I have a way to make it work. If I do, I'll provide a patch release to the minor(s) that deku uses.

The behavior as written is valid as long as bits is in the range 1 ..= T::BITS; since bitvec already enforces this requirement by crashing on violation, it can be assumed to be true in your code.

Implementation notes: my experimental solution was to compute a sign mask, then conditionally apply it. Almost all calls into BitField use bit-slices with compile-time-constant lengths, and so that length information is available for const-propagation into values that do not have data dependencies. My suspicion was that the shl; sar sequence in korrat's example caused a data dependency whose opcodes would be slower to evaluate than a test and conditional or.

This comparison code:

pub fn shifty_sext(val: i32, bits: u32) -> i32 {
    check(bits);
    let shamt = 32 - bits;
    (val << shamt) >> shamt
}

pub fn masky_sext(mut val: i32, bits: u32) -> i32 {
    check(bits);
    let mask = !0 << bits;
    if val & (1 << (bits - 1)) != 0 {
        val |= mask;
    }
    val
}

pub fn shifty_sext_12(val: i32) -> i32 {
    shifty_sext(val, 12)
}

pub fn masky_sext_12(val: i32) -> i32 {
    masky_sext(val, 12)
}

fn check(bits: u32) {
    if bits == 0 || bits > 32 {
        panic!("invalid shamt");
    }
}

when compiled under -O produces this assembler:

example::shifty_sext:
        mov     ecx, esi
        lea     eax, [rcx - 1]
        cmp     eax, 32
        jae     .LBB7_1
        neg     cl
        shl     edi, cl
        sar     edi, cl
        mov     eax, edi
        ret
.LBB7_1:
        push    rax
        call    std::panicking::begin_panic
        ud2

example::masky_sext:
        lea     eax, [rsi - 1]
        cmp     eax, 32
        jae     .LBB8_1
        lea     eax, [rsi - 1]
        movzx   r8d, al
        mov     edx, -1
        mov     ecx, esi
        shl     edx, cl
        xor     eax, eax
        bt      edi, r8d
        cmovb   eax, edx
        or      eax, edi
        ret
.LBB8_1:
        push    rax
        call    std::panicking::begin_panic
        ud2

example::shifty_sext_12:
        mov     eax, edi
        shl     eax, 20
        sar     eax, 20
        ret

example::masky_sext_12:
        mov     eax, edi
        shl     eax, 20
        sar     eax, 31
        and     eax, -4096
        or      eax, edi
        ret

While the unpropagated mask signer does not have a data dependency, the version with const-propagated length information does; this metric is now a tie and the shift signer has a shorter instruction stream (and no large immediates), so it wins.

It is a bug in bitvec for unsigned loads to be incorrect, and I have a great deal of confidence that they are not. I am also fairly sure by inspection that your shl; sar implementation is also correct under the invariants that bitvec imposes, so if you want to whip up a test case for signing two integers, one where the top loaded bit is 0 and one where it is 1, and demonstrate that they correctly sign-extend, that should suffice.

I believe that changing the BitField API to increase generality from M: IsUnsigned to M: IsInteger does not constitute an API breaking change, but I am not 100% confident in this. Since I am already working on a full crate rewrite for 1.0, which I hope to complete this calendar year, I would tentatively recommend placing this behavior in deku as soon as you are ready, rather than waiting for me to produce a bitvec release of any version that contains it.

Thanks korrat for the ping, and for tipping me over the edge into providing this support directly!

sharksforarms added the enhancement New feature or request label Oct 5, 2021

korrat mentioned this issue Oct 16, 2021

Sign extend signed integers #238

Merged

sharksforarms closed this as completed in #238 Feb 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign-extending signed integer types? #236

Sign-extending signed integer types? #236

korrat commented Sep 30, 2021

sharksforarms commented Oct 5, 2021

korrat commented Oct 7, 2021

myrrlyn commented Oct 11, 2021

Sign-extending signed integer types? #236

Sign-extending signed integer types? #236

Comments

korrat commented Sep 30, 2021

sharksforarms commented Oct 5, 2021

korrat commented Oct 7, 2021

myrrlyn commented Oct 11, 2021