-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove key_size() method from Column trait #34021
Conversation
This helper simply called std::mem::size_of<Self::Index>(). However, all of the underlying functions that create keys manually copy fields into a byte array. The fields are copied in end-to-end whereas size_of() might include alignment bytes. That is, a (u64, u32) only has 12 bytes of "data", but it would have size 16 due to the 4 alignment padding bytes that would be added to get the u32 (size 4) aligned with the u64 (size 8).
f371161
to
50acbbc
Compare
We iterate through key-value pairs anyways, so just get the key size from there.
50acbbc
to
4dc7f09
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #34021 +/- ##
=========================================
- Coverage 81.9% 81.9% -0.1%
=========================================
Files 818 818
Lines 219939 219936 -3
=========================================
- Hits 180219 180163 -56
- Misses 39720 39773 +53 |
Would it actually be useful to have? If so, we could probably rework the default implementation to be correct by constructing an actual key and getting the len. |
Can we instead keep the And for each column-family, we implement its |
One spot for sure would be the hard-coded key length here: solana/ledger/src/blockstore_db.rs Line 1121 in 45290c4
We had a near miss in another PR where the index was updated for a new column, but the size of the key array was not and the key vector extra bytes before getting fixed. It was here if you're interested:#33979 (comment) As were talking through, we agreed that it'd be nice to have a way to avoid that. If the let mut key = vec![0; Self::key_size()]; That's a good point that doing something like this would allow us to compute the value: let key_len = Self::key(Self::as_index(0)).len(); However, I don't think we can compute this at compile time, and I think this would add overhead to what is a pretty fundamental function in creating a key. And, creating a key and computing the length would not have prevented agains the bug described above where the key vector was created with extra bytes. Solutions that came to mind were something like:
So, in the absence of a solution that could calculate |
@@ -719,10 +719,6 @@ impl Rocks { | |||
pub trait Column { | |||
type Index; | |||
|
|||
fn key_size() -> usize { | |||
std::mem::size_of::<Self::Index>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just simply remove this problematic default implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we decided not to do anything as this could also be error-prone
Yeah, all of the alternatives do have gaps, and none of them would really help the case you linked, where a key decreases in size.
As such, I'm fine with this.
Ha, you beat me to posting by a couple seconds. I thought of doing this in the past; however, the argument against it is that I could still see it being prone to the same bug from Ashwin's PR where:
Any unit test that checked the length of |
I think I must miss something. So if we write 8+4 bytes but read only 8+8 bytes, if we compare what we wrote and what we read, will there be a mis-match or it does not because it's based on |
The code was originally something like this: impl Column for columns::MerkleRootMeta {
type Index = (Slot, /*fec_set_index:*/ u64);
fn index(key: &[u8]) -> Self::Index {
let slot = BigEndian::read_u64(&key[..8]);
let fec_set_index = BigEndian::read_u64(&key[8..]);
(slot, fec_set_index)
}
fn key((slot, fec_set_index): Self::Index) -> Vec<u8> {
let mut key = vec![0; 16];
BigEndian::write_u64(&mut key[..8], slot);
BigEndian::write_u64(&mut key[8..], fec_set_index);
key
} Noe that the second element of the impl Column for columns::MerkleRootMeta {
type Index = (Slot, /*fec_set_index:*/ u32);
fn index(key: &[u8]) -> Self::Index {
let slot = BigEndian::read_u64(&key[..8]);
let fec_set_index = BigEndian::read_u32(&key[8..]);
(slot, fec_set_index)
}
fn key((slot, fec_set_index): Self::Index) -> Vec<u8> {
let mut key = vec![0; 16];
BigEndian::write_u64(&mut key[..8], slot);
BigEndian::write_u32(&mut key[8..], fec_set_index);
key
} The bug was that the following line was not updated: // Buggy
let mut key = vec![0; 16];
// Proper
let mut key = vec[0; 12];
Within |
Let me see if I understand it correctly. So it's the mismatch between the length of the returned Is my understanding correct? |
Yep, you got it. We could implement In this case, our unit test would contain the same value as the actual constant in source code. Unit test will only fail if you've update the source code constant. But, if you already updated the source code constant, then you remembered to do the right thing and the unit test didn't give any aid in helping you to remember to update the source code value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it. Given this function doesn't provide much value and isn't always consistent with the actual code.
Problem
This helper simply called std::mem::size_ofSelf::Index(). However, all of the underlying functions that create keys manually copy fields into a byte array. The fields are copied in end-to-end whereas size_of() might include alignment bytes.
That is, a (u64, u32) only has 12 bytes of "data", but it would have size 16 due to the 4 alignment padding bytes that would be added to get the u32 (size 4) aligned with the u64 (size 8).
Summary of Changes
The helper could be useful, but in its' current state, it is incorrect and dangerous to leave around in that someone might make the incorrect assumption above in regards to alignment bytes.
Also, here is a Rust playground link to demonstrate that
std::mem::size_of::<(u64, u32)>() == 16
:https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=b1848a2e974119930cc7e59c0b662274