Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pad shreds on retrieval from blockstore only when needed #17204

Closed
wants to merge 11 commits into from

Conversation

steviez
Copy link
Contributor

@steviez steviez commented May 13, 2021

Problem

#16602 introduced a change where shreds would have zero-padding stripped upon insertion to the blockstore, and re-added upon retrieval. However, discussion here let us to find that the zero-padding wasn't consistently applied, depending on the method used to retrieve shred(s) (there are multiple methods to pull shreds out).

Summary of Changes

This PR introduces a change in philosophy @carllin and I discussed where shreds will intentionally NOT be zero-padded when pulled out of the blockstore unless explicitly needed. There are several instances where zero padding may be necessary:

  • Erasure
  • Duplicate shred detection (could potentially be optimized by only comparing the actual payload and not the zero padding of the shred bytestreams).

In other cases, the idea is to not add the zero-padding back as we don't explicitly need it. This change in philosophy saves us some bytestream resizing; however, it comes at the cost of complexity of needing to know when the bytestream has had the zero-padding stripped vs. when it has not.

Fixes #

@steviez
Copy link
Contributor Author

steviez commented May 13, 2021

There are several different "stages" for a data shred:

  • A packet has the shred payload + the nonce
  • A "proper" shred has the shred payload, including any zero padding
  • A shred that has been inserted/retrieved from the blockstore has any zero padding removed

@sakridge @behzadnouri (and anyone else with thoughts on the matter) - I'd be interested to get some opinions on this. It seemed like a good idea when Carl and I discussed, but after having implemented it, I'm semi-jaded as it seems kind of error-prone to keep track of these stages. I could finish things off and have it working well, but seems a little brittle.

The other option which I pitched over in https://github.com/solana-labs/solana/pull/17175/files would be to always add the zero-padding back to a shred when it gets pulled from the blockstore. However, this wasn't the cleanest solution either.

@steviez steviez force-pushed the lazy_shred_zero_padding branch from 4a44404 to 303827d Compare May 13, 2021 18:14
@codecov
Copy link

codecov bot commented May 13, 2021

Codecov Report

Merging #17204 (ea67c1d) into master (a5c2067) will decrease coverage by 0.0%.
The diff coverage is 92.0%.

@@            Coverage Diff            @@
##           master   #17204     +/-   ##
=========================================
- Coverage    82.6%    82.6%   -0.1%     
=========================================
  Files         425      425             
  Lines      118842   118889     +47     
=========================================
+ Hits        98271    98306     +35     
- Misses      20571    20583     +12     

@steviez steviez marked this pull request as ready for review May 17, 2021 18:58
core/src/serve_repair.rs Outdated Show resolved Hide resolved
core/src/window_service.rs Outdated Show resolved Hide resolved
ledger/src/blockstore.rs Outdated Show resolved Hide resolved
ledger/src/blockstore.rs Outdated Show resolved Hide resolved
ledger/src/blockstore.rs Outdated Show resolved Hide resolved
@behzadnouri
Copy link
Contributor

@steviez asked me to take a look at this pr. My thoughts (mostly copy paste from slack chat):

  • As already mentioned here, this is adding some extra complexity, which feels like going to be very fragile and bug prone.
  • Also not a change that can be done once and move on; it will probably always need caution and patching going forward, any time there is a change to shred/blockstore/etc.

So my suggestion is to avoid this unless there is a demonstrable performance gain. If there is such a performance gain, then probably we can at least add some type safety. Something like changing the shred data type to:

enum ShredData {
  Padded(Vec<u8>),
  Trimmed(Vec<u8>),
}

and then explicitly track where things are padded vs trimmed. It will require some work around implementing serialization and deserialization for backward compatibility but it is much better than the situation that if a function obtains a shred, we do not know if its data is padded or not.

Even with the addition of above type safety thing, this require some work and diligence, so might be better to hold off until there is a performance benchmark justifying the effort.

@steviez
Copy link
Contributor Author

steviez commented May 19, 2021

@carllin - Per our conversation in DM, I threw together a quick benchmark to try to actually understand what kind of performance bump we're looking at. I pushed it into this branch here

Here is the output if you run that (these numbers are from my MacBook Pro):

test bench_allocate_1000_vec_500            ... bench:     113,060 ns/iter (+/- 23,011)
test bench_allocate_and_resize_1000_vec_500 ... bench:     272,783 ns/iter (+/- 17,942)
test bench_allocate_and_resize_vec_1000     ... bench:         197 ns/iter (+/- 20)
test bench_allocate_and_resize_vec_250      ... bench:         188 ns/iter (+/- 12)
test bench_allocate_and_resize_vec_500      ... bench:         190 ns/iter (+/- 19)
test bench_allocate_and_resize_vec_750      ... bench:         192 ns/iter (+/- 40)
test bench_allocate_vec_1000                ... bench:           0 ns/iter (+/- 0)
test bench_allocate_vec_250                 ... bench:           0 ns/iter (+/- 0)
test bench_allocate_vec_500                 ... bench:           0 ns/iter (+/- 0)
test bench_allocate_vec_750                 ... bench:           0 ns/iter (+/- 0)

With the built in bencher, anything that goes inside bencher.iter() gets measured over some number of iterations. However, since Vec::resize() modifies the the vector, we need to perform the allocation inside the measurement loop too (to avoid calling resize() on a vector that has already been resized for all but the first bench iteration).

So, bench_allocate_vec_* just allocate a vector of the size indicated bytes, bench_allocate_and_resize_vec_* allocate and resize to SHRED_PAYLOAD_SIZE (1228) as the name might suggest. The top 2 tests do the same thing, but for 1,000 vectors within one iteration (which I'll discuss why I did this in a minute).

Notes:

  • I'm not really sure why I'm getting 0 ns for the tests that only allocate, the only thing I could think of is maybe the vector creation was getting compiled out since it is unused? Modifying a random element in the vector (so it is used) didn't have any effect and still yielded 0 ns.
  • Starting with a different sized buffer for the allocate and resize tests had negligible effects (9 ns difference for starting with vector of size 250 bytes vs 1000 bytes).
  • Because of the previous, I made the two tests where I do the operations on 1,000 vectors / iteration. The numbers reported are ns / iteration, so dividing by 1000 vector / iteration will give us ns / vector.

So, allocating the vector takes ~113 ns, allocating and resize takes ~272 ns, so we can surmise that the resize is taking ~159 ns, or less than 0.2 us.

A 0.2 us savings that we only get in a few cases (erasure / broadcast require the resize) seems pretty minimal, what do you think ?

@behzadnouri
Copy link
Contributor

I'm not really sure why I'm getting 0 ns for the tests that only allocate, ... getting compiled out since it is unused

I think one way to prevent that is to have the lambda that you are using in bencher.iter(|| { ...}) return the vector it allocates.

A 0.2 us savings that we only get in a few cases (erasure / broadcast require the resize) seems pretty minimal, what do you think ?

yeah, does not seem significant to me either; at least considering the code complexity needed to avoid it. I do not expect that this is the bottleneck on any code path.

@steviez steviez force-pushed the lazy_shred_zero_padding branch from fe631ab to 1108075 Compare May 20, 2021 16:25
@steviez
Copy link
Contributor Author

steviez commented May 20, 2021

yeah, does not seem significant to me either; at least considering the code complexity needed to avoid it. I do not expect that this is the bottleneck on any code path.

Hey, was just about to post an update but you beat me to it @behzadnouri 😄. @carllin and I dove into this a little bit yesterday. For starters, we replicated similar numbers on a validator, so the tests on my local machine are valid (same orders of magnitude). That being said, Carl did observe that on a validator under load, the number

Next, we realized that it probably makes more sense to think about this number as it relates to deserialization. Also, the bench I threw up yesterday was a bit convoluted, so I added a duplicate deserialize function that does a resize; this allows us to better compare the cost of the resize in relation to the cost of deserialize. That can be found here

$ cargo +nightly bench bench_deserialize
...
test bench_deserialize_shred_no_padding ... bench:         249 ns/iter (+/- 25)
test bench_deserialize_shred_padding    ... bench:         354 ns/iter (+/- 11)

The top one skips the resize; the bottom one does the resize. So, this shows us the resize is costing about 100 ns = 0.1 us, and that the resize results in about 40% increase in deserialize time.

Carl mentioned that he thought we may get up to around 5000 shreds / slot at the moment. Using that number:
5000 shreds/slot * 100 ns/shred = 500,000 ns/slot = 500 us/slot = 0.5 ms/slot

So, 0.125% of slot duration. If we bumped up to 10,000 shreds / slot and 200 ms slots, that'd get us to 1 ms/slot and 0.5% of slot duration.

@steviez steviez force-pushed the lazy_shred_zero_padding branch 2 times, most recently from fcc137b to e93e602 Compare May 25, 2021 16:42
@steviez steviez force-pushed the lazy_shred_zero_padding branch 2 times, most recently from 97ffab1 to 34ee441 Compare May 25, 2021 17:54
@steviez steviez force-pushed the lazy_shred_zero_padding branch from 34ee441 to d5cf747 Compare May 25, 2021 22:40
Comment on lines +360 to +370
fn get_padded_cf(&self, cf: &ColumnFamily, key: &[u8], size: usize) -> Result<Option<Vec<u8>>> {
let opt = self.0.get_cf(cf, key)?.map(|db_vec| {
let mut bytes = vec![0; size];
{
let (data, _padding) = bytes.split_at_mut(db_vec.len());
data.copy_from_slice(&db_vec[..]);
}
bytes
});
Ok(opt)
}
Copy link
Contributor Author

@steviez steviez May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to break this out into a separate function was because of get_cf()'s implementation:

fn get_cf(&self, cf: &ColumnFamily, key: &[u8]) -> Result<Option<Vec<u8>>> {
   let opt = self.0.get_cf(cf, key)?.map(|db_vec| db_vec.to_vec());
   Ok(opt)
}

Note the db_vec.to_vec(); this clones db_vec into a new vector. If we do the resize later, then a second allocation & memcpy will occur with Vector::resize(); doing this here shaves that off.

@@ -247,6 +247,13 @@ impl Shred {
packet.meta.size = len;
}

pub fn copy_from_packet(packet: &Packet) -> Result<Self> {
let mut serialized_shred = vec![0; SHRED_PAYLOAD_SIZE];
// TODO: assert packet.data.len() >= SHRED_PAYLOAD_SIZE / == PACKET_DATA_SIZE ?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on this assertion? We had something like this in window service; we could move it here to avoid repeated lines for all the callers of this.

@stale
Copy link

stale bot commented Jun 3, 2021

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jun 3, 2021
@stale
Copy link

stale bot commented Jun 16, 2021

This stale pull request has been automatically closed. Thank you for your contributions.

@stale stale bot closed this Jun 16, 2021
@steviez steviez deleted the lazy_shred_zero_padding branch December 20, 2022 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants