Pad shreds on retrieval from blockstore only when needed #17204

steviez · 2021-05-13T08:37:39Z

Problem

#16602 introduced a change where shreds would have zero-padding stripped upon insertion to the blockstore, and re-added upon retrieval. However, discussion here let us to find that the zero-padding wasn't consistently applied, depending on the method used to retrieve shred(s) (there are multiple methods to pull shreds out).

Summary of Changes

This PR introduces a change in philosophy @carllin and I discussed where shreds will intentionally NOT be zero-padded when pulled out of the blockstore unless explicitly needed. There are several instances where zero padding may be necessary:

Erasure
Duplicate shred detection (could potentially be optimized by only comparing the actual payload and not the zero padding of the shred bytestreams).

In other cases, the idea is to not add the zero-padding back as we don't explicitly need it. This change in philosophy saves us some bytestream resizing; however, it comes at the cost of complexity of needing to know when the bytestream has had the zero-padding stripped vs. when it has not.

Fixes #

steviez · 2021-05-13T08:51:22Z

There are several different "stages" for a data shred:

A packet has the shred payload + the nonce
A "proper" shred has the shred payload, including any zero padding
A shred that has been inserted/retrieved from the blockstore has any zero padding removed

@sakridge @behzadnouri (and anyone else with thoughts on the matter) - I'd be interested to get some opinions on this. It seemed like a good idea when Carl and I discussed, but after having implemented it, I'm semi-jaded as it seems kind of error-prone to keep track of these stages. I could finish things off and have it working well, but seems a little brittle.

The other option which I pitched over in https://github.com/solana-labs/solana/pull/17175/files would be to always add the zero-padding back to a shred when it gets pulled from the blockstore. However, this wasn't the cleanest solution either.

codecov · 2021-05-13T21:38:02Z

Codecov Report

Merging #17204 (ea67c1d) into master (a5c2067) will decrease coverage by 0.0%.
The diff coverage is 92.0%.

@@            Coverage Diff            @@
##           master   #17204     +/-   ##
=========================================
- Coverage    82.6%    82.6%   -0.1%     
=========================================
  Files         425      425             
  Lines      118842   118889     +47     
=========================================
+ Hits        98271    98306     +35     
- Misses      20571    20583     +12

core/src/serve_repair.rs

core/src/window_service.rs

ledger/src/blockstore.rs

behzadnouri · 2021-05-18T16:09:16Z

@steviez asked me to take a look at this pr. My thoughts (mostly copy paste from slack chat):

As already mentioned here, this is adding some extra complexity, which feels like going to be very fragile and bug prone.
Also not a change that can be done once and move on; it will probably always need caution and patching going forward, any time there is a change to shred/blockstore/etc.

So my suggestion is to avoid this unless there is a demonstrable performance gain. If there is such a performance gain, then probably we can at least add some type safety. Something like changing the shred data type to:

enum ShredData {
  Padded(Vec<u8>),
  Trimmed(Vec<u8>),
}

and then explicitly track where things are padded vs trimmed. It will require some work around implementing serialization and deserialization for backward compatibility but it is much better than the situation that if a function obtains a shred, we do not know if its data is padded or not.

Even with the addition of above type safety thing, this require some work and diligence, so might be better to hold off until there is a performance benchmark justifying the effort.

steviez · 2021-05-19T21:14:48Z

@carllin - Per our conversation in DM, I threw together a quick benchmark to try to actually understand what kind of performance bump we're looking at. I pushed it into this branch here

Here is the output if you run that (these numbers are from my MacBook Pro):

test bench_allocate_1000_vec_500            ... bench:     113,060 ns/iter (+/- 23,011)
test bench_allocate_and_resize_1000_vec_500 ... bench:     272,783 ns/iter (+/- 17,942)
test bench_allocate_and_resize_vec_1000     ... bench:         197 ns/iter (+/- 20)
test bench_allocate_and_resize_vec_250      ... bench:         188 ns/iter (+/- 12)
test bench_allocate_and_resize_vec_500      ... bench:         190 ns/iter (+/- 19)
test bench_allocate_and_resize_vec_750      ... bench:         192 ns/iter (+/- 40)
test bench_allocate_vec_1000                ... bench:           0 ns/iter (+/- 0)
test bench_allocate_vec_250                 ... bench:           0 ns/iter (+/- 0)
test bench_allocate_vec_500                 ... bench:           0 ns/iter (+/- 0)
test bench_allocate_vec_750                 ... bench:           0 ns/iter (+/- 0)

With the built in bencher, anything that goes inside bencher.iter() gets measured over some number of iterations. However, since Vec::resize() modifies the the vector, we need to perform the allocation inside the measurement loop too (to avoid calling resize() on a vector that has already been resized for all but the first bench iteration).

So, bench_allocate_vec_* just allocate a vector of the size indicated bytes, bench_allocate_and_resize_vec_* allocate and resize to SHRED_PAYLOAD_SIZE (1228) as the name might suggest. The top 2 tests do the same thing, but for 1,000 vectors within one iteration (which I'll discuss why I did this in a minute).

Notes:

I'm not really sure why I'm getting 0 ns for the tests that only allocate, the only thing I could think of is maybe the vector creation was getting compiled out since it is unused? Modifying a random element in the vector (so it is used) didn't have any effect and still yielded 0 ns.
Starting with a different sized buffer for the allocate and resize tests had negligible effects (9 ns difference for starting with vector of size 250 bytes vs 1000 bytes).
Because of the previous, I made the two tests where I do the operations on 1,000 vectors / iteration. The numbers reported are ns / iteration, so dividing by 1000 vector / iteration will give us ns / vector.

So, allocating the vector takes ~113 ns, allocating and resize takes ~272 ns, so we can surmise that the resize is taking ~159 ns, or less than 0.2 us.

A 0.2 us savings that we only get in a few cases (erasure / broadcast require the resize) seems pretty minimal, what do you think ?

behzadnouri · 2021-05-20T15:44:23Z

I'm not really sure why I'm getting 0 ns for the tests that only allocate, ... getting compiled out since it is unused

I think one way to prevent that is to have the lambda that you are using in bencher.iter(|| { ...}) return the vector it allocates.

A 0.2 us savings that we only get in a few cases (erasure / broadcast require the resize) seems pretty minimal, what do you think ?

yeah, does not seem significant to me either; at least considering the code complexity needed to avoid it. I do not expect that this is the bottleneck on any code path.

steviez · 2021-05-20T16:53:45Z

yeah, does not seem significant to me either; at least considering the code complexity needed to avoid it. I do not expect that this is the bottleneck on any code path.

Hey, was just about to post an update but you beat me to it @behzadnouri 😄. @carllin and I dove into this a little bit yesterday. For starters, we replicated similar numbers on a validator, so the tests on my local machine are valid (same orders of magnitude). That being said, Carl did observe that on a validator under load, the number

Next, we realized that it probably makes more sense to think about this number as it relates to deserialization. Also, the bench I threw up yesterday was a bit convoluted, so I added a duplicate deserialize function that does a resize; this allows us to better compare the cost of the resize in relation to the cost of deserialize. That can be found here

$ cargo +nightly bench bench_deserialize
...
test bench_deserialize_shred_no_padding ... bench:         249 ns/iter (+/- 25)
test bench_deserialize_shred_padding    ... bench:         354 ns/iter (+/- 11)

The top one skips the resize; the bottom one does the resize. So, this shows us the resize is costing about 100 ns = 0.1 us, and that the resize results in about 40% increase in deserialize time.

Carl mentioned that he thought we may get up to around 5000 shreds / slot at the moment. Using that number:
5000 shreds/slot * 100 ns/shred = 500,000 ns/slot = 500 us/slot = 0.5 ms/slot

So, 0.125% of slot duration. If we bumped up to 10,000 shreds / slot and 200 ms slots, that'd get us to 1 ms/slot and 0.5% of slot duration.

Make shred unit test pass

steviez · 2021-05-26T07:41:37Z

ledger/src/blockstore_db.rs

+    fn get_padded_cf(&self, cf: &ColumnFamily, key: &[u8], size: usize) -> Result<Option<Vec<u8>>> {
+        let opt = self.0.get_cf(cf, key)?.map(|db_vec| {
+            let mut bytes = vec![0; size];
+            {
+                let (data, _padding) = bytes.split_at_mut(db_vec.len());
+                data.copy_from_slice(&db_vec[..]);
+            }
+            bytes
+        });
+        Ok(opt)
+    }


The reason to break this out into a separate function was because of get_cf()'s implementation:

fn get_cf(&self, cf: &ColumnFamily, key: &[u8]) -> Result<Option<Vec<u8>>> { let opt = self.0.get_cf(cf, key)?.map(|db_vec| db_vec.to_vec()); Ok(opt) }

Note the db_vec.to_vec(); this clones db_vec into a new vector. If we do the resize later, then a second allocation & memcpy will occur with Vector::resize(); doing this here shaves that off.

steviez · 2021-05-26T07:43:54Z

ledger/src/shred.rs

@@ -247,6 +247,13 @@ impl Shred {
        packet.meta.size = len;
    }

+    pub fn copy_from_packet(packet: &Packet) -> Result<Self> {
+        let mut serialized_shred = vec![0; SHRED_PAYLOAD_SIZE];
+        // TODO: assert packet.data.len() >= SHRED_PAYLOAD_SIZE / == PACKET_DATA_SIZE ?


Thoughts on this assertion? We had something like this in window service; we could move it here to avoid repeated lines for all the callers of this.

stale · 2021-06-03T02:49:05Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2021-06-16T22:10:49Z

This stale pull request has been automatically closed. Thank you for your contributions.

steviez force-pushed the lazy_shred_zero_padding branch from 4a44404 to 303827d Compare May 13, 2021 18:14

steviez marked this pull request as ready for review May 17, 2021 18:58

carllin reviewed May 17, 2021

View reviewed changes

core/src/serve_repair.rs Outdated Show resolved Hide resolved

carllin reviewed May 18, 2021

View reviewed changes

core/src/window_service.rs Outdated Show resolved Hide resolved

carllin reviewed May 18, 2021

View reviewed changes

ledger/src/blockstore.rs Outdated Show resolved Hide resolved

carllin reviewed May 18, 2021

View reviewed changes

ledger/src/blockstore.rs Outdated Show resolved Hide resolved

carllin reviewed May 18, 2021

View reviewed changes

ledger/src/blockstore.rs Outdated Show resolved Hide resolved

steviez force-pushed the lazy_shred_zero_padding branch from fe631ab to 1108075 Compare May 20, 2021 16:25

steviez force-pushed the lazy_shred_zero_padding branch 2 times, most recently from fcc137b to e93e602 Compare May 25, 2021 16:42

Steven Czabaniuk added 5 commits May 25, 2021 12:36

Only zero-pad data shreds when necessary

004db7e

Remove resizing from Shred::new_from_serialized()

3259dfa

Remove nonce from shred payload before deserialization

fafc686

Remove all-shred-same-length requirement from deshred()

3844a03

Make shred unit test pass

Make serve_repair tests pass

d8b6061

steviez force-pushed the lazy_shred_zero_padding branch 2 times, most recently from 97ffab1 to 34ee441 Compare May 25, 2021 17:54

Steven Czabaniuk added 4 commits May 25, 2021 17:40

Add bench that more clearly shows difference for resize

ca41625

Add comment about payload truncation

d52173c

Rename function

389e5af

Condense repeated shred logic and move back into shred module

d5cf747

steviez force-pushed the lazy_shred_zero_padding branch from 34ee441 to d5cf747 Compare May 25, 2021 22:40

Steven Czabaniuk added 2 commits May 26, 2021 00:26

Make cargo fmt happy

44c0b36

Add functions to remove repeated resizes

ea67c1d

steviez commented May 26, 2021

View reviewed changes

carllin approved these changes May 26, 2021

View reviewed changes

stale bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jun 3, 2021

stale bot closed this Jun 16, 2021

steviez deleted the lazy_shred_zero_padding branch December 20, 2022 08:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pad shreds on retrieval from blockstore only when needed #17204

Pad shreds on retrieval from blockstore only when needed #17204

steviez commented May 13, 2021 •

edited

Loading

steviez commented May 13, 2021

codecov bot commented May 13, 2021 •

edited

Loading

behzadnouri commented May 18, 2021

steviez commented May 19, 2021 •

edited

Loading

behzadnouri commented May 20, 2021

steviez commented May 20, 2021 •

edited

Loading

steviez May 26, 2021 •

edited

Loading

steviez May 26, 2021

stale bot commented Jun 3, 2021

stale bot commented Jun 16, 2021

Pad shreds on retrieval from blockstore only when needed #17204

Pad shreds on retrieval from blockstore only when needed #17204

Conversation

steviez commented May 13, 2021 • edited Loading

Problem

Summary of Changes

steviez commented May 13, 2021

codecov bot commented May 13, 2021 • edited Loading

Codecov Report

behzadnouri commented May 18, 2021

steviez commented May 19, 2021 • edited Loading

behzadnouri commented May 20, 2021

steviez commented May 20, 2021 • edited Loading

steviez May 26, 2021 • edited Loading

Choose a reason for hiding this comment

steviez May 26, 2021

Choose a reason for hiding this comment

stale bot commented Jun 3, 2021

stale bot commented Jun 16, 2021

steviez commented May 13, 2021 •

edited

Loading

codecov bot commented May 13, 2021 •

edited

Loading

steviez commented May 19, 2021 •

edited

Loading

steviez commented May 20, 2021 •

edited

Loading

steviez May 26, 2021 •

edited

Loading