-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
embeds versioning into shred binary #25631
embeds versioning into shred binary #25631
Conversation
const SIZE_OF_DATA_SHRED_HEADER: usize = 5; | ||
const SIZE_OF_CODING_SHRED_HEADER: usize = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing these two because they are not used individually, and ..._HEADER
vs ..._HEADERS
is error-prone.
pub enum ShredCode { | ||
Legacy(legacy::ShredCode), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#25237 will add merkle variant:
pub enum ShredCode {
Legacy(legacy::ShredCode),
+ Merkle(merkle::ShredCode),
}
pub enum ShredData { | ||
Legacy(legacy::ShredData), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#25237 will add merkle variant:
pub enum ShredData {
Legacy(legacy::ShredData),
+ Merkle(merkle::ShredData),
}
e3e06e0
to
e277597
Compare
enum ShredVariant { | ||
LegacyCode, // 0b0101_1010 | ||
LegacyData, // 0b1010_0101 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#25237 adds merkle variants:
enum ShredVariant {
LegacyCode, // 0b0101_1010
LegacyData, // 0b1010_0101
+ MerkleCode(/*proof_size:*/ u8), // 0b0100_????
+ MerkleData(/*proof_size:*/ u8), // 0b1000_????
}
e277597
to
e1c4f09
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing comments from the original review.
|
||
/// The following constants are computed by hand, and hardcoded. | ||
/// `test_shred_constants` ensures that the values are correct. | ||
/// Constants are used over lazy_static for performance reasons. | ||
const SIZE_OF_COMMON_SHRED_HEADER: usize = 83; | ||
const SIZE_OF_DATA_SHRED_HEADER: usize = 5; | ||
const SIZE_OF_CODING_SHRED_HEADER: usize = 6; | ||
const SIZE_OF_DATA_SHRED_HEADERS: usize = 88; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is useful to have these constants defined as a sum of their constituent parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well they are not used anywhere, and as mentioned on the side comment, I think SIZE_OF_..._HEADER
vs ..._HEADERS
is confusing and easy to miss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I guess I'm looking for breaking it down into the sum of basic types (type/slot/index/nonce/...) but I'm ok with this. I'm happy that we won't have both the HEADER
and HEADERS
consts.
impl From<ShredVariant> for u8 { | ||
fn from(shred_variant: ShredVariant) -> u8 { | ||
match shred_variant { | ||
ShredVariant::LegacyCode => u8::from(ShredType::Code), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will ShredType
be used to distinguish data/code for both legacy and merkle shreds? If yes, then it seems that this should be an internal type which isn't serialized. I think the value for the wire encoding should be tied to ShredVariant
and not as part of the ShredType
enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will ShredType be used to distinguish data/code for both legacy and merkle shreds?
yes
then it seems that this should be an internal type
ShredVariant
is the internal type. The code outside this module should not know/care that there are different variants. But ShredType
is public; There is logic outside this module based on if the shred is code or data.
which isn't serialized
ShredType
is not serialized in this crate, but, elsewhere:
https://github.com/solana-labs/solana/blob/3ed2e0ce2/gossip/src/duplicate_shred.rs#L36
I think the value for the wire encoding should be tied to ShredVariant and not as part of the ShredType enum.
How would you tie the value on ShredVariant
?
The merkle variants will be tuple variants, and it errors out:
error[E0658]: custom discriminant values are not allowed in enums with tuple or struct variants
There is a tracking issue which might allow this one-day:
rust-lang/rust#60553
but even then I am using lower 4 bits to encode proof size, so not sure if that RFC would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting that we define the following for the shred module.
const SHRED_VARIANT_LEGACY_DATA: u8 = 0b1010_0101;
const SHRED_VARIANT_LEGACY_CODE: u8 = 0b0101_1010;
and we remove the custom values from the ShredType
enum.
Then we handle the u8
to/from conversion without relying on u8::from(ShredType::Code)
within this function.
ShredType
doesn't seem like the right place to have the wire format now that we have ShredVariant
. Ideally we'd be able to use custom enum values for ShredVariant
but as you note it's not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the commit cannot change ShredType
for backward compatibility:
https://github.com/solana-labs/solana/blob/3ed2e0ce2/gossip/src/duplicate_shred.rs#L36
Anyways, this seems like a non blocker issue which can be revisited later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's not going to introduce incorrect behavior.
Backwards compatibility just requires that it serializes/deserializes from u8
. It seems strange to make ShredVariant
private, expose ShredType
as a generic value to indicate only data/code distinction, but keep the specific legacy wire value as a property of the generic type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backwards compatibility just requires that it serializes/deserializes from u8.
It requires to serialize to and deserialize from those specific constant u8
values; not any u8
. Effectively those specific u8
values are already part of the abi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will duplicate shred need to understand merkle variant? If so, we'll need to extend ShredType
for that. At which point I'm not sure what purpose it serves to have both ShredType
and ShredVariant
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will duplicate shred need to understand merkle variant?
It should not.
The code outside this module should not know or care if a code or data shred is merkle variant or legacy variant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems some distinction must be made to account for batch signing and retransmit behavior of merkle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can move any code logic that needs to dispatch implementation based on shred-variant into ledger/src/shred.rs
.
e.g. something like below if that turns out to be needed
impl Shred {
fn sign_erasure_batch(keypair: &Keypair, shreds: &mut [Shred]) {
...
}
}
@@ -894,6 +934,7 @@ mod tests { | |||
assert_matches!(bincode::deserialize::<ShredType>(&[1u8]), Err(_)); | |||
// data shred | |||
assert_eq!(ShredType::Data as u8, 0b1010_0101); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tie u8
wire representation to ShredVariant
rather than ShredType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented above
ledger/src/shred.rs
Outdated
@@ -1012,7 +1090,7 @@ mod tests { | |||
let seed = <[u8; 32]>::try_from(bs58_decode(SEED)).unwrap(); | |||
ChaChaRng::from_seed(seed) | |||
}; | |||
let mut parity_shard = vec![0u8; /*ENCODED_PAYLOAD_SIZE:*/ 1139]; | |||
let mut parity_shard = vec![0u8; /*SIZE_OF_ERASURE_ENCODED_SLICE:*/ 1139]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can 1139
be defined as a const at the beginning of mod tests
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to legacy::SIZE_OF_ERASURE_ENCODED_SLICE
ledger/src/shred/legacy.rs
Outdated
&self.payload[..self.data_header.size as usize] | ||
self.payload | ||
.get(SIZE_OF_DATA_SHRED_HEADERS..size) | ||
.ok_or(Error::InvalidPayloadSize(self.payload.len())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this duplicating the prior check but returning a different error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted.
|
||
#[inline] | ||
pub(super) fn erasure_shard_index<T: ShredCodeTrait>(shred: &T) -> Option<usize> { | ||
// Assert that the last shred index in the erasure set does not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change "Assert" to "return None if ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an old code just moved here:
https://github.com/solana-labs/solana/blob/v1.9.10/ledger/src/shred.rs#L546-L547
The comment is referring to sanity checks being done, not the implementation.
ledger/src/sigverify_shreds.rs
Outdated
let msg = shred::layout::get_signed_message_range(packet); | ||
let msg = add_offset(msg, pubkeys_end); | ||
let shred = shred::layout::get_shred(packet); | ||
// TODO: signature verification should fail for empty messages! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something to do for the "TODO"? Or just let the verification fail if the buffer is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed the comment
let msg = shred::layout::get_signed_message_range(packet); | ||
let msg = shred::layout::get_shred(packet) | ||
.and_then(shred::layout::get_signed_message_range) | ||
.unwrap(); | ||
assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this assert superfluous after the unwrap()
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert is on sig.end
, the unwrap is on msg
.
Here it does not make any assumptions about the underlying layout of shreds.
In preparation of solana-labs#25237 which adds a new shred variant with merkle tree branches, the commit embeds versioning into shred binary by encoding a new ShredVariant type at byte 65 of payload replacing previously ShredType at this offset. enum ShredVariant { LegacyCode, // 0b0101_1010 LegacyData, // 0b0101_1010 } * 0b0101_1010 indicates a legacy coding shred, which is also equal to ShredType::Code for backward compatibility. * 0b1010_0101 indicates a legacy data shred, which is also equal to ShredType::Data for backward compatibility. Following commits will add merkle variants to this type: enum ShredVariant { LegacyCode, // 0b0101_1010 LegacyData, // 0b1010_0101 MerkleCode(/*proof_size:*/ u8), // 0b0100_???? MerkleData(/*proof_size:*/ u8), // 0b1000_???? }
The commit implements two new types: pub enum ShredCode { Legacy(legacy::ShredCode), } pub enum ShredData { Legacy(legacy::ShredData), } Following commits will extend these types by adding merkle variants: pub enum ShredCode { Legacy(legacy::ShredCode), Merkle(merkle::ShredCode), } pub enum ShredData { Legacy(legacy::ShredData), Merkle(merkle::ShredData), }
e1c4f09
to
411f1b2
Compare
impl ShredCode { | ||
pub(super) const SIZE_OF_PAYLOAD: usize = PACKET_DATA_SIZE - SIZE_OF_NONCE; | ||
|
||
dispatch!(fn coding_header(&self) -> &CodingShredHeader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behzadnouri just curious, why enum_dispatch isn't used here?: https://docs.rs/enum_dispatch/latest/enum_dispatch/
maybe, there was some shortcomings in the crate? (in fact, i haven't used it but I'm wondering we could use it for cleaning lots of blockstore's ColumnFaimily types and hand-written dispatching there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found this dispatch!
macro simpler and more explicit.
enum_dispatch
requires a trait; this does not.
Problem
Need backward compatible way to change binary layout of shreds;
which requires some form of versioning embedded into shred binary
Summary of Changes
First commit:
In preparation of #25237
which adds a new shred variant with merkle tree branches, the commit
embeds versioning into shred binary by encoding a new
ShredVariant
type at byte 65 of payload replacing previously
ShredType
at thisoffset.
0b0101_1010
indicates a legacy coding shred, which is also equal toShredType::Code
for backward compatibility.0b1010_0101
indicates a legacy data shred, which is also equal toShredType::Data
for backward compatibility.Following commits will add merkle variants to this type:
Second commit:
Adds support for different variants for ShredCode and ShredData by adding two
new types:
#25237 will extend these types by adding merkle variants: