Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize bytes #278

Merged
merged 13 commits into from
Oct 6, 2022
Merged

Conversation

wcampbell0x2a
Copy link
Collaborator

Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See #44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

See #25

@wcampbell0x2a
Copy link
Collaborator Author

I'm getting errors on the following lines:

failures:
    src/attributes.rs - attributes (line 1130)
    src/attributes.rs - attributes (line 270)

For these tests, not sure why:
Example:

# use deku::prelude::*;
# use std::convert::{TryInto, TryFrom};
# #[derive(PartialEq, Debug, DekuRead, DekuWrite)]
#[deku(type = "u32", bytes = "2")]
enum DekuTest {
    #[deku(id = "0xBEEF")]
    VariantA(u8),
}

let data: Vec<u8> = vec![0xEF, 0xBE, 0xFF];

let value = DekuTest::try_from(data.as_ref()).unwrap();

assert_eq!(
    DekuTest::VariantA(0xFF),
    value
);

let value: Vec<u8> = value.try_into().unwrap();
assert_eq!(data, value);

Example:

# use deku::prelude::*;
# use std::convert::{TryInto, TryFrom};
# #[derive(Debug, PartialEq, DekuRead, DekuWrite)]
struct DekuTest {
    #[deku(bytes = 2)]
    field_a: u32,
    field_b: u8, // defaults to size_of<u8>
}

let data: Vec<u8> = vec![0xAB, 0xCD, 0xFF];

let value = DekuTest::try_from(data.as_ref()).unwrap();

assert_eq!(
    DekuTest {
       field_a: 0xCDAB,
       field_b: 0xFF,
    },
    value
);

let value: Vec<u8> = value.try_into().unwrap();
assert_eq!(data, value);

@wcampbell0x2a
Copy link
Collaborator Author

@sharksforarms @constfold lmk what you think


let (bit_slice, rest) = input.split_at(bit_size);

let bytes: &[u8] = bit_slice.as_raw_slice();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly what I want in that comment. But...

the removal of this branch for bytes

if pad == 0
&& bit_slice.len() == max_type_bits
&& bit_slice.as_raw_slice().len() * 8 == max_type_bits

Can we really do this? What if bit_slice is not byte aligned?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be good given:

  • let bit_size: usize = size.0 * 8; -- will always be byte aligned
  • we check bit_size > max_type_bits & input.len() < bit_size before calling split_at

@constfold
Copy link
Contributor

constfold commented Sep 30, 2022

failures:
   src/attributes.rs - attributes (line 1130)
   src/attributes.rs - attributes (line 270)

two error messages both said:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Parse("error parsing from slice: could not convert slice to array")', src/attributes.rs:15:47

It looks like the same as #224, caused by alignment issue(might be the one I mentioned above).

@sharksforarms
Copy link
Owner

sharksforarms commented Sep 30, 2022

Hey @wcampbell0x2a, nice work! and 👋 @constfold nice to hear from you too. This is looking great (thanks for the benchmarks!)

I don't have any comments apart from the test cases, I can offer my understanding on why they're failing.

#[deku(type = "u32", bytes = "2")]
enum DekuTest {
    #[deku(id = "0xBEEF")]
    VariantA(u8),
}

let data: Vec<u8> = vec![0xEF, 0xBE, 0xFF];
let value = DekuTest::try_from(data.as_ref()).unwrap();

So, the issue with these tests is that the changes no longer consider padding when reading:

Original code:

                let value = if pad == 0
                    && bit_slice.len() == max_type_bits
                    && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                {
                    // if everything is aligned, just read the value

This section of code is an optimisation for when we have exactly what we want.

The above test using this PR, will read 2 bytes and try to convert that slice to a u32 and fail (u32 needs 4 bytes).

To fix this, we'd need to bring padding back some way or another. I'm thinking it may be possible to pad using arithmetic instead of allocation too. Maybe a later optimisation.

@wcampbell0x2a
Copy link
Collaborator Author

So, the issue with these tests is that the changes no longer consider padding when reading:

Could we add a Padding struct for this? When bytes is more than type? Are there other cases in which this would happen for a field in a struct as well as this enum usage? This also wouldn't happen for type = u8 as I see it.

Just trying to reduce as many branches as I can for performance.

@sharksforarms
Copy link
Owner

So, the issue with these tests is that the changes no longer consider padding when reading:

Could we add a Padding struct for this? When bytes is more than type? Are there other cases in which this would happen for a field in a struct as well as this enum usage? This also wouldn't happen for type = u8 as I see it.

Just trying to reduce as many branches as I can for performance.

If we can do that at compile type that'd be great, I'm not sure we can though...
Also you're right with u8, this is a simple case which can be specialized (and needs to, else rustc complains about shift overflow) here's some food for thought: wcampbell0x2a#2

@wcampbell0x2a
Copy link
Collaborator Author

wcampbell0x2a commented Oct 1, 2022

Current benchmarking 📈

deku_read_byte          time:   [8.3902 ns 8.4387 ns 8.4959 ns]
                        change: [-17.300% -16.110% -14.983%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low mild
  8 (8.00%) high mild
  9 (9.00%) high severe

deku_write_byte         time:   [63.329 ns 63.648 ns 64.012 ns]
                        change: [-7.6643% -5.4937% -3.9645%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  1 (1.00%) low severe
  8 (8.00%) low mild
  6 (6.00%) high mild
  8 (8.00%) high severe

deku_read_bits          time:   [500.82 ns 504.55 ns 509.19 ns]
                        change: [-0.9629% +1.7699% +4.4820%] (p = 0.21 > 0.05)
                        No change in performance detected.
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) high mild
  14 (14.00%) high severe

deku_write_bits         time:   [95.921 ns 96.374 ns 96.879 ns]
                        change: [-11.428% -9.0292% -7.2123%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

deku_read_enum          time:   [14.375 ns 14.418 ns 14.466 ns]
                        change: [-21.776% -19.593% -17.363%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  7 (7.00%) high severe

deku_write_enum         time:   [101.08 ns 101.55 ns 102.10 ns]
                        change: [-10.996% -8.9539% -7.4107%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  7 (7.00%) high mild
  11 (11.00%) high severe

deku_read_vec           time:   [777.98 ns 779.94 ns 782.09 ns]
                        change: [-13.747% -12.440% -10.565%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  2 (2.00%) high severe

deku_write_vec          time:   [4.2898 µs 4.3064 µs 4.3268 µs]
                        change: [-14.469% -12.650% -10.762%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  12 (12.00%) high severe

deku_read_vec_perf      time:   [684.70 ns 687.68 ns 690.87 ns]

deku_write_vec_perf     time:   [4.6496 µs 4.6687 µs 4.6894 µs]

@wcampbell0x2a
Copy link
Collaborator Author

One of the CI failures is from

warning: you are deriving `PartialEq` and can implement `Eq`
  --> src/ctx.rs:76:30
   |
76 | #[derive(Debug, Copy, Clone, PartialEq, Eq, Ord, PartialOrd)]
   |                              ^^^^^^^^^ help: consider deriving `Eq` as well: `PartialEq, Eq`
77: pub enum Limit<T, Predicate: FnMut(&T) -> bool> {

which looks to be a duplicate of: rust-lang/rust-clippy#9413

@sharksforarms
Copy link
Owner

Thanks! There was a couple more TODO i added in regards to code duplication that should be addressed before merging

@wcampbell0x2a
Copy link
Collaborator Author

Thanks! There was a couple more TODO i added in regards to code duplication that should be addressed before merging

I think I fixed those now.

wcampbell0x2a and others added 11 commits October 6, 2022 09:40
Remove Size in favor of a BitSize and ByteSize. This allows the
Deku{Read/Write}(Endian, BitSize) and Deku{Read/Write}(Endian, ByteSize)
impls to be created and allow the ByteSize impls to be faster. Most of
the assumption I made (and the perf that is gained) is from the removal
of this branch for bytes:

let value = if pad == 0
                      && bit_slice.len() == max_type_bits
                      && bit_slice.as_raw_slice().len() * 8 == max_type_bits
                  {

See sharksforarms#44

I Added some benchmarks, in order to get a good idea of what perf this
allows. The benchmarks look pretty good, but I'm on my old thinkpad.
These results are vs the benchmarks running from master.

deku_read_vec           time:   [744.39 ns 751.13 ns 759.58 ns]
                        change: [-23.681% -21.358% -19.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

 See sharksforarms#25
Remove DekuRead for BitSize, and use same as other through macro
generation
@sharksforarms sharksforarms merged commit 1934406 into sharksforarms:master Oct 6, 2022
@sharksforarms
Copy link
Owner

Thanks! Release pending :)

@wcampbell0x2a wcampbell0x2a deleted the specialize-bytes branch October 6, 2022 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants