-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD-0186: Loaded Transaction Data Size Specification #186
base: main
Are you sure you want to change the base?
Changes from 4 commits
092a67c
04e283b
c694155
078a6f9
8c0603b
a015ab7
64db94f
ec59003
173f092
194c751
59bf637
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,163 @@ | ||||||||||
--- | ||||||||||
simd: '0186' | ||||||||||
title: Loaded Transaction Data Size Specification | ||||||||||
authors: | ||||||||||
- Hanako Mumei | ||||||||||
category: Standard | ||||||||||
type: Core | ||||||||||
status: Review | ||||||||||
created: 2024-10-20 | ||||||||||
feature: (fill in with feature tracking issues once accepted) | ||||||||||
--- | ||||||||||
|
||||||||||
## Summary | ||||||||||
|
||||||||||
Before a transaction can be executed, every account it may read from or write to | ||||||||||
must be loaded, including any programs it may call. The amount of data a | ||||||||||
transaction is allowed to load is capped, and if it exceeds that limit, loading | ||||||||||
is aborted. This functionality is already implemented in the validator. The | ||||||||||
purpose of this SIMD is to explicitly define how loaded transaction data size is | ||||||||||
calculated. | ||||||||||
|
||||||||||
## Motivation | ||||||||||
|
||||||||||
Transaction data size accounting is currently unspecified, and the | ||||||||||
implementation-defined algorithm used in the Agave client exhibits some | ||||||||||
surprising behaviors: | ||||||||||
|
||||||||||
* BPF loaders required by instructions' program IDs are counted against | ||||||||||
transaction data size. BPF loaders required by CPI programs are not. If a | ||||||||||
required BPF loader is also included in the accounts list, it is counted twice. | ||||||||||
* The size of a program owned by LoaderV3 may or may not include the size of its | ||||||||||
programdata depending on how the program account is used on the transaction. | ||||||||||
Programdata is also itself counted if included in the transaction accounts list. | ||||||||||
This means programdata may be counted zero, one, or two times per transaction. | ||||||||||
* Due to certain quirks of implementation, loader-owned accounts which do not | ||||||||||
contain valid programs for execution may or may not be counted against the | ||||||||||
transaction data size total depending on how they are used on the transaction. | ||||||||||
This includes, but is not limited to, LoaderV3 buffer accounts, and accounts | ||||||||||
which fail ELF validation. | ||||||||||
* Accounts can be included on a transaction account list without being an | ||||||||||
instruction account, fee-payer, or program ID. These accounts are presently | ||||||||||
loaded and counted against transaction data size, although they can never be | ||||||||||
used for any purpose by the transaction. | ||||||||||
|
||||||||||
All validator clients must arrive at precisely the same transaction data size | ||||||||||
for all transactions because a difference of one byte can determine whether a | ||||||||||
transaction is executed or failed, and thus affects consensus. Also, we want the | ||||||||||
calculated transaction data size to correspond well with the actual amount of | ||||||||||
data the transaction requests. | ||||||||||
|
||||||||||
Therefore, this SIMD seeks to specify an algorithm that is straightforward to | ||||||||||
implement in a client-agnostic way, while also accurately accounting for all | ||||||||||
account data required by the transaction. | ||||||||||
|
||||||||||
## New Terminology | ||||||||||
|
||||||||||
No new terms are introduced by this SIMD, however we define these for clarity: | ||||||||||
|
||||||||||
* Instruction account: an account passed to an instruction in its accounts | ||||||||||
array, which allows the program to view the actual bytes contained in the | ||||||||||
account. CPI can only happen through programs provided as instruction accounts. | ||||||||||
* Transaction accounts list: all accounts for the transaction, which includes | ||||||||||
instruction accounts, the fee-payer, program IDs, and any extra accounts added | ||||||||||
to the list but not used for any purpose. | ||||||||||
* LoaderV3 program account: an account owned by | ||||||||||
`BPFLoaderUpgradeab1e11111111111111111111111` which contains in its account data | ||||||||||
the first four bytes `02 00 00 00` followed by a pubkey which points to an | ||||||||||
account which is defined as the program's programdata account. | ||||||||||
|
||||||||||
For the purposes of this SIMD, we make no assumptions about the contents of the | ||||||||||
programdata account. | ||||||||||
|
||||||||||
## Detailed Design | ||||||||||
|
||||||||||
The proposed algorithm is as follows: | ||||||||||
|
||||||||||
1. Given a transaction, take the unique set of account keys which are used as: | ||||||||||
* An instruction account. | ||||||||||
* A program ID for an instruction. | ||||||||||
* The fee-payer. | ||||||||||
2. Each account's size is determined solely by the byte length of its data prior | ||||||||||
to transaction execution. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we include an overhead amount of bytes as well? There is still accounts db / svm overhead when loading a bunch of accounts with no data. Rent calculations use this constant: pub const ACCOUNT_STORAGE_OVERHEAD: u64 = 128; but the size of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel like we should include everything that is serialized, not just data. @tao-stones any reason why we shouldn't just make this a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. iirc, original motivation is primarily focus on memory footprint and how that chunk of memory is used, loading from accountdb and serializing wasn't discussed much then, it might make sense now There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about 96 bytes? From https://github.com/anza-xyz/agave/blob/4e7f7f76f453e126b171c800bbaca2cb28637535/programs/bpf_loader/src/serialization.rs#L417, I see we have to serialize 3 bytes for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes that is the size of the serialized account metadata. But have you thought about the resize / realloc padding ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, serialization will stay ABIv1 in loader-v4. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. realloc headroom is quite different conceptually from "loaded data" tho isnt it? memory is reserved in case its needed, but no data is loaded There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds like we need a (loose?) definition for what loaded data is intended to mean. It could be the amount of bytes read from accounts db (in that case maybe programdata is only counted once?) or it could be roughly the amount of bytes loaded into memory before tx execution? In either case I think it makes sense to include the account metadata overhead of approx. 64 bytes and exclude the overhead of the account key itself and the realloc buffer since they're not really loaded in any sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. im equally fine counting programdata once or twice, and after this pr lets us refactor we should only have to actually load it once. main concerns are that the algorithm is described unambiguously, is very simple to implement correctly, and always counts programdata if you use the program maybe a new definition (counting programdata once) like:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm also fine either way, slight bias to counting it once now if we only need to load it once from accounts-db. Do we agree on 64 bytes overhead? |
||||||||||
3. For any `LoaderV3` program account, add the size of the programdata account | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: On the first read I missed the fact that any loaded account which happened to be a
|
||||||||||
it references, if it exists. | ||||||||||
4. The total transaction size is the sum of these sizes. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
Transactions may include a | ||||||||||
`ComputeBudgetInstruction::SetLoadedAccountsDataSizeLimit` instruction to define | ||||||||||
a data size limit for the transaction. Otherwise, the default limit is 64MiB | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be worth calling out that the default limit of 64MiB is also the max limit. The limit can only be lowered right now
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just fyi, anza-xyz/agave#1355 attempted to introduce DEFAULT_LOADED_ACCOUNTS_DATA_SIZE_BYTES. Leader currently used "actual" loaded accounts size after execution in block packing, so a lower DEFAULT is critical atm There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. im definitely on board with reducing the default, but it should be done separately from this simd. after we implement the new algorithm, we can run it against ledger history to collect new data about what transaction sizes would look like with it enabled. calculated sizes will increase nontrivially and we dont want to pick a number that will cause significant breakage for example, in #1355 2MiB is suggested as a good number that only causes 5% breakage. however, the jupiter v4 program is 2.5MiB and jupiter v6 is close to 3MiB. this means they must be passing their program as an instruction account, which due to the current algorithm, means the program is only counted as 36 bytes. it also means under the new algorithm which properly counts programs, a 2MiB default limit would break 100% of jupiter transactions once we fix the size calculations, we will be in a better position to judge what the real sizes of transactions actually are |
||||||||||
(`64 * 1024 * 1024` bytes). | ||||||||||
|
||||||||||
If a transaction exceeds its data size limit, the transaction is failed. Fees | ||||||||||
will be charged once `enable_transaction_loading_failure_fees` is enabled. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where will this be checked? Do transactions that fail this check make it into blocks? Do you mind clarifying this in the SIMD? 🙏 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok i updated the wording to:
the idea is we arent changing any of the logic about how loading works here, the existing flow stays the same (loading happens at the same time in the same place of transaction processing, what kind of error you get if you exceed the limit, how that error is handled and how its reflected in ledger history, etc) the only thing that changes is what the number of bytes you arrive at will be and how you determine what that number is |
||||||||||
|
||||||||||
Adding required loaders to transaction data size is abolished. They are treated | ||||||||||
the same as any other account: counted if used in a manner described by 1, not | ||||||||||
counted otherwise. | ||||||||||
|
||||||||||
No account that falls outside of the three categories listed by 1 is counted | ||||||||||
against transaction data size. Validator clients are free to decline to load | ||||||||||
them. | ||||||||||
|
||||||||||
Read-only and writable accounts are treated the same. In the future, when direct | ||||||||||
mapping is enabled, this SIMD may be amended to count them differently. | ||||||||||
|
||||||||||
As a consequence of 1 and 3, for LoaderV3 programs, programdata is counted twice | ||||||||||
if a transaction explicitly references the program account and its programdata | ||||||||||
account. This is done partly for simplicity, and partly to account for the cost | ||||||||||
of maintaining the compiled program in addition to the actual bytes of | ||||||||||
the programdata account. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we settled on just counting them once right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes i forgot to delete this |
||||||||||
|
||||||||||
We include programdata size in account size for LoaderV3 programs because using | ||||||||||
the program account on a transaction forces an unconditional load of programdata | ||||||||||
to compile the program for execution. We always count it, even when the program | ||||||||||
is an instruction account, because the program must be available for CPI. | ||||||||||
|
||||||||||
There is no special handling for any account owned by the native loader, | ||||||||||
LoaderV1, or LoaderV2. | ||||||||||
|
||||||||||
Account size for programs owned by LoaderV4 is left undefined. This SIMD should | ||||||||||
be amended to define the required semantics before LoaderV4 is enabled on any | ||||||||||
network. | ||||||||||
|
||||||||||
## Alternatives Considered | ||||||||||
|
||||||||||
* Transaction data size accounting is already enabled, so the null option is to | ||||||||||
enshrine the current Agave behavior in the protocol. This is undesirable because | ||||||||||
the current behavior is highly idiosyncratic, and LoaderV3 program sizes are | ||||||||||
routinely undercounted. | ||||||||||
* Builtin programs are backed by accounts that only contain the program name as | ||||||||||
a string, typically making them 15-40 bytes. We could impose a larger fixed cost | ||||||||||
for these. However, they must be made available for all programs anyway, and | ||||||||||
most of them are likely to be ported to BPF eventually, so this adds complexity | ||||||||||
for no real benefit. | ||||||||||
* Several slightly different algorithms were considered for handling LoaderV3 | ||||||||||
programs in particular, for instance only counting programs that are valid for | ||||||||||
execution in the current slot. However, this would implicitly couple transaction | ||||||||||
data size with the results of ELF validation, which is highly undesirable. | ||||||||||
* We considered loading and counting sizes for accounts on the transaction | ||||||||||
account list which are not used for any purpose. This is the current behavior, | ||||||||||
but there is no reason to load such accounts at all. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section is no longer relevant |
||||||||||
|
||||||||||
## Impact | ||||||||||
|
||||||||||
The primary impact is this SIMD makes correctly implementing transaction data | ||||||||||
size accounting much easier for other validator clients. | ||||||||||
|
||||||||||
It makes the calculated size of transactions which include program accounts for | ||||||||||
CPI somewhat larger, but given the generous 64MiB limit, it is unlikely that any | ||||||||||
existing users will be affected. Based on an investigation of a 30-day window, | ||||||||||
transactions larger than 30MiB are virtually never seen. | ||||||||||
|
||||||||||
## Security Considerations | ||||||||||
|
||||||||||
Security impact is minimal because this SIMD merely simplifies an existing | ||||||||||
feature. Care must be taken to implement the rules exactly. | ||||||||||
|
||||||||||
This SIMD requires a feature gate. | ||||||||||
|
||||||||||
## Backwards Compatibility | ||||||||||
|
||||||||||
Transactions that currently have a total transaction data size close to the | ||||||||||
64MiB limit, which call LoaderV3 programs via CPI, may now exceed it and fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have to include every account in the transaction actually if this SIMD is accepted: #163 (cc @Lichtso)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is generally bad to try to have extra conditions for TX accounts depending on how they are used. That is how we got to the write lock demotion and only top-level-instructions counting loader-v3 programdata account mess we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, ill revert that change
also i love this simd, i was thinking the other week it would be so nice if you could specify accounts you might want to execute but dont need to see instruction data for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!