Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ledger/blockstore: Use u32 for shred indices. #2452

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ilya-bobyr
Copy link

@ilya-bobyr ilya-bobyr commented Aug 6, 2024

Shred indices, within a slot, can not exceed a u32 value.
Unfortunately, at the moment, there is no common type that is used when a shred index needs to be stored.
Part of the code use u32, another considerable part uses u64 and a few places use usize.

Ideally, it would be better to have a dedicated type, wrapping the primitive u32 value.

This change is trying to unify the types used, moving most of the code that works with shred indices to u32.
u64 is used in a few locations that are serialized into the blockstore or are part of the wire protocol.
Those are left as is for now.

As the conversion is not 100%, there are still a few conversions left, that could be avoided should both the serialization and the wire protocol code use u32.
I also was not completely sure about the usize types used by the Merkel tree code - those might be convertible to u32 as well.

Stats (roughly):

  • Removed as u{32,64}:
$ git diff --cached | grep '^-.*as u\(32\|64\)' | wc -l
50
  • Removed unwrap()/expect():
$ git diff --cached | grep '^-.*\(unwrap()\|expect()\)' | wc -l
20
  • Added as u{32,64}:
$ git diff --cached | grep '^+.*as u\(32\|64\)' | wc -l
7
  • Added unwrap()/expect():
$ git diff --cached | grep '^+.*\(unwrap()\|expect()\)' | wc -l
32

Total is 50 + 20 - 7 - 32 = 31 conversions removed.
And a lot of the removed conversion are unchecked, while a lot of the added ones are checked.
The current version would still panic if a value outside the u32 scope is used - it will just panic further away from the bug location, complicating the debugging efforts.

This change also prepares the grounds for removal of most of the conversion that are still present.
We would need to convert the serialization and the wire code to u32 as well.

There are a number of as usize conversions added in the new code.
Those should be safe, as they convert from u32.
Most of them are used when indexing into Vecs.

Shred indices, within a slot, can not exceed a `u32` value.
Unfortunately, at the moment, there is no common type that is used when
a shred index needs to be stored.  Part of the code use `u32`, another
considerable part uses `u64` and a few places use `usize`.

Ideally, it would be better to have a dedicated type, wrapping the
primitive `u32` value.

This change is trying to unify the types used, moving most of the code
that works with shred indices to `u32`.  `u64` is used in a few
locations that are serialized into the blockstore or are part of the
wire protocol.  Those are left as is for now.

As the conversion is not 100%, there are still a few conversions left,
that could be avoided should both the serialization and the wire
protocol code use `u32`.  I also was not completely sure about the
`usize` types used by the Merkel tree code - those might be convertible
to `u32` as well.

Stats (roughly):

  * Removed `as u{32,64}`:

  $ git diff --cached | grep '^-.*as u\(32\|64\)' | wc -l
  50

  * Removed `unwrap()`/`expect()`:

  $ git diff --cached | grep '^-.*\(unwrap()\|expect()\)' | wc -l
  20

  * Added `as u{32,64}`:

  $ git diff --cached | grep '^+.*as u\(32\|64\)' | wc -l
  7

  * Added `unwrap()`/`expect()`:

  $ git diff --cached | grep '^+.*\(unwrap()\|expect()\)' | wc -l
  32

Total is 50 + 20 - 8 - 32 = 30 conversions removed.  And a lot of the
removed conversion are unchecked, while a lot of the added ones are
checked.  Current version would still panic, if a value outside the
`u32` scope is used - it will just panic further away from the bug
location, complicating the debugging efforts.

This change also prepares the grounds for removal of most of the
conversion that are still present.  We would need to convert the
serialization and the wire code to `u32` as well.

There are a number of `as usize` conversions added in the new code.
Those should be safe, as they convert from `u32`.  Most of them are used
when indexing into `Vec`s.
@ilya-bobyr ilya-bobyr marked this pull request as ready for review August 6, 2024 05:01
@ilya-bobyr
Copy link
Author

@behzadnouri @AshwinSekar
You both seems to have been cleaning some u64/u32 conversions for the shred indices: #1841 and solana-labs#34268.
Would either or both of you be interested in reviewing this change?

Behzad, I think I've showed you this change more than a year ago.
But at that point, you told me you think that usize should be used for the shreds indices.
Seems like now there is a consensus that it should indeed be u32 and I can stop, hopefully, rebasing my change and merge it :)

@behzadnouri
Copy link

Thanks @ilya-bobyr for making this change and I do agree having consistent types is generally preferred.

However this change is pretty big and intrusive and I am worried something pretty subtle will be missed out in the review. Some of the changes might also have performance implications as u64 is machine word size and u32 is not.

I do also see a new ShredIndexExceeded error type and several new fallible ::try_from and unwrap/expect instances in the commit, which might mean that though some fallible type conversions are removed, new ones are added elsewhere. So it is not clear that this change is net positive in terms of reducing fallible type conversions and also because each of these new try_from/unwrap/expect instances needs diligent scrutiny which is impossible in such a large change.

That said, smaller parts of this change might be ok to merge if they are done in separate much much smaller commits.

You both seems to have been cleaning some u64/u32 conversions for the shred indices: #1841 and solana-labs#34268.

The goal there was to patch some very recently added changes and make blockstore type match the shred struct and as you see the changes are pretty small and very limited in scope.

@ilya-bobyr
Copy link
Author

However this change is pretty big and intrusive and I am worried something pretty subtle will be missed out in the review

I absolutely agree here.
It is a big change, and I would also be worried if someone would send me such a change :)

So it is not clear that this change is net positive in terms of reducing fallible type conversions and also because each of these new try_from/unwrap/expect instances needs diligent scrutiny which is impossible in such a large change.

I've done the math in the PR description that suggests that it is a net positive.
And it is also a net positive in the sense that it reduces the number of cases where u64 or usize is used for shred indices.
Meaning that it is a step towards a state where u32 is the only type used for shred indices.
Removing unnecessary conversions, and making the code more readable.
Which is good both for performance and for the code security.

[...] smaller parts of this change might be ok to merge if they are done in separate much much smaller commits.

I can try splitting this change into smaller chunks.
But I can not be certain upfront that all the changes that compile would be both considerably smaller and would not introduce unnecessary checks.

Notice that in this change, I've only updated function arguments and return types.
(So, I do not think it could introduce any new performance issues)
I did start with just a single function, and followed the compiler errors.
And this is what it turned into.

That said, smaller parts of this change might be ok to merge if they are done in separate much much smaller commits.

If either of you are interested in working with me on this, I could see what chunks I can cut it into.

@ilya-bobyr ilya-bobyr self-assigned this Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants