Specialize bytes #278

wcampbell0x2a · 2022-09-28T04:30:14Z

Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See #44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

See #25

wcampbell0x2a · 2022-09-28T04:32:47Z

I'm getting errors on the following lines:

failures:
    src/attributes.rs - attributes (line 1130)
    src/attributes.rs - attributes (line 270)

For these tests, not sure why:
Example:

# use deku::prelude::*;
# use std::convert::{TryInto, TryFrom};
# #[derive(PartialEq, Debug, DekuRead, DekuWrite)]
#[deku(type = "u32", bytes = "2")]
enum DekuTest {
    #[deku(id = "0xBEEF")]
    VariantA(u8),
}

let data: Vec<u8> = vec![0xEF, 0xBE, 0xFF];

let value = DekuTest::try_from(data.as_ref()).unwrap();

assert_eq!(
    DekuTest::VariantA(0xFF),
    value
);

let value: Vec<u8> = value.try_into().unwrap();
assert_eq!(data, value);

Example:

# use deku::prelude::*;
# use std::convert::{TryInto, TryFrom};
# #[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct DekuTest {
    #[deku(bytes = 2)]
    field_a: u32,
    field_b: u8, // defaults to size_of<u8>
}

let data: Vec<u8> = vec![0xAB, 0xCD, 0xFF];

let value = DekuTest::try_from(data.as_ref()).unwrap();

assert_eq!(
    DekuTest {
       field_a: 0xCDAB,
       field_b: 0xFF,
    },
    value
);

let value: Vec<u8> = value.try_into().unwrap();
assert_eq!(data, value);

wcampbell0x2a · 2022-09-28T04:33:03Z

@sharksforarms @constfold lmk what you think

constfold · 2022-09-30T00:09:12Z

src/impls/primitive.rs

+
+                let (bit_slice, rest) = input.split_at(bit_size);
+
+                let bytes: &[u8] = bit_slice.as_raw_slice();


This is exactly what I want in that comment. But...

the removal of this branch for bytes

if pad == 0 && bit_slice.len() == max_type_bits && bit_slice.as_raw_slice().len() * 8 == max_type_bits

Can we really do this? What if bit_slice is not byte aligned?

We should be good given:

let bit_size: usize = size.0 * 8; -- will always be byte aligned

we check bit_size > max_type_bits & input.len() < bit_size before calling split_at

constfold · 2022-09-30T00:17:32Z

failures:
   src/attributes.rs - attributes (line 1130)
   src/attributes.rs - attributes (line 270)

two error messages both said:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Parse("error parsing from slice: could not convert slice to array")', src/attributes.rs:15:47

It looks like the same as #224, caused by alignment issue(might be the one I mentioned above).

sharksforarms · 2022-09-30T01:01:46Z

Hey @wcampbell0x2a, nice work! and 👋 @constfold nice to hear from you too. This is looking great (thanks for the benchmarks!)

I don't have any comments apart from the test cases, I can offer my understanding on why they're failing.

#[deku(type = "u32", bytes = "2")]
enum DekuTest {
    #[deku(id = "0xBEEF")]
    VariantA(u8),
}

let data: Vec<u8> = vec![0xEF, 0xBE, 0xFF];
let value = DekuTest::try_from(data.as_ref()).unwrap();

So, the issue with these tests is that the changes no longer consider padding when reading:

Original code:

                let value = if pad == 0
                    && bit_slice.len() == max_type_bits
                    && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                {
                    // if everything is aligned, just read the value

This section of code is an optimisation for when we have exactly what we want.

The above test using this PR, will read 2 bytes and try to convert that slice to a u32 and fail (u32 needs 4 bytes).

To fix this, we'd need to bring padding back some way or another. I'm thinking it may be possible to pad using arithmetic instead of allocation too. Maybe a later optimisation.

wcampbell0x2a · 2022-09-30T02:18:04Z

So, the issue with these tests is that the changes no longer consider padding when reading:

Could we add a Padding struct for this? When bytes is more than type? Are there other cases in which this would happen for a field in a struct as well as this enum usage? This also wouldn't happen for type = u8 as I see it.

Just trying to reduce as many branches as I can for performance.

sharksforarms · 2022-09-30T13:46:18Z

So, the issue with these tests is that the changes no longer consider padding when reading:

Could we add a Padding struct for this? When bytes is more than type? Are there other cases in which this would happen for a field in a struct as well as this enum usage? This also wouldn't happen for type = u8 as I see it.

Just trying to reduce as many branches as I can for performance.

If we can do that at compile type that'd be great, I'm not sure we can though...
Also you're right with u8, this is a simple case which can be specialized (and needs to, else rustc complains about shift overflow) here's some food for thought: wcampbell0x2a#2

wcampbell0x2a · 2022-10-01T01:46:35Z

Current benchmarking 📈

deku_read_byte          time:   [8.3902 ns 8.4387 ns 8.4959 ns]
                        change: [-17.300% -16.110% -14.983%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low mild
  8 (8.00%) high mild
  9 (9.00%) high severe

deku_write_byte         time:   [63.329 ns 63.648 ns 64.012 ns]
                        change: [-7.6643% -5.4937% -3.9645%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  1 (1.00%) low severe
  8 (8.00%) low mild
  6 (6.00%) high mild
  8 (8.00%) high severe

deku_read_bits          time:   [500.82 ns 504.55 ns 509.19 ns]
                        change: [-0.9629% +1.7699% +4.4820%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) high mild
  14 (14.00%) high severe

deku_write_bits         time:   [95.921 ns 96.374 ns 96.879 ns]
                        change: [-11.428% -9.0292% -7.2123%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

deku_read_enum          time:   [14.375 ns 14.418 ns 14.466 ns]
                        change: [-21.776% -19.593% -17.363%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  7 (7.00%) high severe

deku_write_enum         time:   [101.08 ns 101.55 ns 102.10 ns]
                        change: [-10.996% -8.9539% -7.4107%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  7 (7.00%) high mild
  11 (11.00%) high severe

deku_read_vec           time:   [777.98 ns 779.94 ns 782.09 ns]
                        change: [-13.747% -12.440% -10.565%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  2 (2.00%) high severe

deku_write_vec          time:   [4.2898 µs 4.3064 µs 4.3268 µs]
                        change: [-14.469% -12.650% -10.762%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  12 (12.00%) high severe

deku_read_vec_perf      time:   [684.70 ns 687.68 ns 690.87 ns]

deku_write_vec_perf     time:   [4.6496 µs 4.6687 µs 4.6894 µs]

wcampbell0x2a · 2022-10-01T02:03:57Z

One of the CI failures is from

warning: you are deriving `PartialEq` and can implement `Eq`
  --> src/ctx.rs:76:30
   |
76 | #[derive(Debug, Copy, Clone, PartialEq, Eq, Ord, PartialOrd)]
   |                              ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`

77: pub enum Limit<T, Predicate: FnMut(&T) -> bool> {

which looks to be a duplicate of: rust-lang/rust-clippy#9413

sharksforarms · 2022-10-01T11:51:05Z

Thanks! There was a couple more TODO i added in regards to code duplication that should be addressed before merging

wcampbell0x2a · 2022-10-02T00:14:46Z

Thanks! There was a couple more TODO i added in regards to code duplication that should be addressed before merging

I think I fixed those now.

Remove Size in favor of a BitSize and ByteSize. This allows the Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize) impls to be created and allow the ByteSize impls to be faster. Most of the assumption I made (and the perf that is gained) is from the removal of this branch for bytes: let value = if pad == 0 && bit_slice.len() == max_type_bits && bit_slice.as_raw_slice().len() * 8 == max_type_bits { See sharksforarms#44 I Added some benchmarks, in order to get a good idea of what perf this allows. The benchmarks look pretty good, but I'm on my old thinkpad. These results are vs the benchmarks running from master. deku_read_vec time: [744.39 ns 751.13 ns 759.58 ns] change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe See sharksforarms#25

Remove DekuRead for BitSize, and use same as other through macro generation

sharksforarms · 2022-10-06T14:02:08Z

Thanks! Release pending :)

constfold reviewed Sep 30, 2022

View reviewed changes

wcampbell0x2a and others added 11 commits October 6, 2022 09:40

Add bits benchmark

93e9864

read with padding and specialize u8

63975ba

reversed, if we're byte alligned, read with bytes impl

23dd2cc

my bad, my dbg/printlns changed this. It didn't actually change.

b626234

fmt

0288fb1

add another test here

5c72794

Make clippy happy

8720048

Remove TODO

398cb3d

Add another perf check for only BitSize

e674d73

Split macros into Bytes and Bits

bbd3663

Remove DekuRead for BitSize, and use same as other through macro generation

sharksforarms force-pushed the specialize-bytes branch from 56b8e07 to bbd3663 Compare October 6, 2022 13:41

sharksforarms added 2 commits October 6, 2022 09:53

fix rebase

d6154bb

workaround for rust-lang/rust-clippy#9413

58bf8a4

sharksforarms merged commit 1934406 into sharksforarms:master Oct 6, 2022

wcampbell0x2a deleted the specialize-bytes branch October 6, 2022 14:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialize bytes #278

Specialize bytes #278

wcampbell0x2a commented Sep 28, 2022

wcampbell0x2a commented Sep 28, 2022

wcampbell0x2a commented Sep 28, 2022

constfold Sep 30, 2022

sharksforarms Sep 30, 2022

constfold commented Sep 30, 2022 •

edited

Loading

sharksforarms commented Sep 30, 2022 •

edited

Loading

wcampbell0x2a commented Sep 30, 2022

sharksforarms commented Sep 30, 2022

wcampbell0x2a commented Oct 1, 2022 •

edited

Loading

wcampbell0x2a commented Oct 1, 2022

sharksforarms commented Oct 1, 2022

wcampbell0x2a commented Oct 2, 2022

sharksforarms commented Oct 6, 2022


		let (bit_slice, rest) = input.split_at(bit_size);

		let bytes: &[u8] = bit_slice.as_raw_slice();

Specialize bytes #278

Specialize bytes #278

Conversation

wcampbell0x2a commented Sep 28, 2022

wcampbell0x2a commented Sep 28, 2022

wcampbell0x2a commented Sep 28, 2022

constfold Sep 30, 2022

Choose a reason for hiding this comment

sharksforarms Sep 30, 2022

Choose a reason for hiding this comment

constfold commented Sep 30, 2022 • edited Loading

sharksforarms commented Sep 30, 2022 • edited Loading

wcampbell0x2a commented Sep 30, 2022

sharksforarms commented Sep 30, 2022

wcampbell0x2a commented Oct 1, 2022 • edited Loading

wcampbell0x2a commented Oct 1, 2022

sharksforarms commented Oct 1, 2022

wcampbell0x2a commented Oct 2, 2022

sharksforarms commented Oct 6, 2022

constfold commented Sep 30, 2022 •

edited

Loading

sharksforarms commented Sep 30, 2022 •

edited

Loading

wcampbell0x2a commented Oct 1, 2022 •

edited

Loading