-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Tiered Accounts DB Storage #30551
Conversation
c23b32e
to
c2f955f
Compare
3541b78
to
5cecdd5
Compare
This comment was marked as spam.
This comment was marked as spam.
5cecdd5
to
41b3105
Compare
41b3105
to
b4eb504
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reading through; here's a few thoughts.
@@ -0,0 +1,466 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some discussion around the relative cost of accessing a hot vs cold account in a transaction would be useful.
If there's large delta then I could imagine somebody wanting to modify their validator to optimize for only accepting transactions with hot accounts to make it more likely that the next leader will build on their block. Or protocols modifying themselves to write unnecessarily into accounts to prevent them from getting cold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's large delta then I could imagine somebody wanting to modify their validator to optimize for only accepting transactions with hot accounts to make it more likely that the next leader will build on their block. Or protocols modifying themselves to write unnecessarily into accounts to prevent them from getting cold.
That's a good insight. The plan is to have the size of hot and cold storage configurable so that validators with different specs/goals can be configured differently to optimize their needs.
In the end, it is a trade-off between the storage size and the run-time performance.
For preventing all validators from avoiding answering RPCs querying cold accounts, do we want to have some type of reward for validators answering cold RPC queries?
@yhchiang-sol I think it would be good to revive this proposal and bring it up-to-date with the current implementation. We'll want this proposal to be completed/accepted as a spec for our implementation. For example, this proposal still says, for the index entries, that the pubkeys and the offsets are together. Per our January meeting, we decided to have these stored separately. |
Sure thing. Let me go over the proposal and make everything up-to-date. |
| | Entries are either sorted or hashed, depending | | ||
| | on the index format specified in the footer. | | ||
+------------------------------------+------------------------------------------------+ | ||
| offsets (8-byte each) | Indices to access the account entry. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to change to u32. Reading it requires packed (u32, u32).
| data_block_offset (8 bytes) | The offset to the account's data block. | | ||
| intra_block_offset (2 bytes) | The inner-block offset to the accounts' data | | ||
| | after decompressing its data block. | | ||
| | | | ||
| uncompressed_data_len (2 bytes)| The length of the uncompressed_data. | | ||
| | If this value is u16::MAX, it means the cold | | ||
| | account has its own account block | | ||
| | In this case, its block size can be derived | | ||
| | by comparing the offset of the next cold index | | ||
| | entry which has a different data_block_offset. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be 4 + 2 + 2.
This repository is no longer in use. Please re-open this pull request in the agave repo: https://github.com/anza-xyz/agave |
Summary of Changes
This proposal presents a hierarchical storage architecture that enables
the accounts database to efficiently manage a large number of accounts.
It stores active accounts in a format that is optimized for performance
while accommodating the storage of inactive accounts in a compressed
format that conserves disk space.
The WIP prototype can be found at #30626