-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD-0186: Loaded Transaction Data Size Specification #186
base: main
Are you sure you want to change the base?
Changes from 2 commits
092a67c
04e283b
c694155
078a6f9
8c0603b
a015ab7
64db94f
ec59003
173f092
194c751
59bf637
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,144 @@ | ||||||
--- | ||||||
simd: '0186' | ||||||
title: Transaction Data Size Specification | ||||||
authors: | ||||||
- Hanako Mumei | ||||||
category: Standard | ||||||
type: Core | ||||||
status: Review | ||||||
created: 2024-10-20 | ||||||
feature: (fill in with feature tracking issues once accepted) | ||||||
--- | ||||||
|
||||||
## Summary | ||||||
|
||||||
Before a transaction can be executed, every account it may read from or write to | ||||||
must be loaded, including any programs it may call. The amount of data a | ||||||
transaction is allowed to load is capped, and if it exceeds that limit, loading | ||||||
is aborted. This functionality is already implemented in the validator. The | ||||||
purpose of this SIMD is to explicitly define how loaded transaction data size is | ||||||
calculated. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Last 2 sentences here make it sound like the SIMD just documents the existing behavior. |
||||||
|
||||||
## Motivation | ||||||
|
||||||
Transaction data size accounting is currently unspecified, and the | ||||||
implementation-defined algorithm used in the Agave client exhibits some | ||||||
surprising behaviors: | ||||||
|
||||||
* BPF loaders required by top-level programs are counted against transaction | ||||||
data size. BPF loaders required by CPI programs are not. If a required BPF | ||||||
loader is also included in the accounts list, it is counted twice. | ||||||
* The size of a program owned by LoaderV3 may or may not include the size of its | ||||||
programdata depending on how the program account is used on the transaction. | ||||||
Programdata is also itself counted if included in the transaction accounts list. | ||||||
This means programdata may be counted zero, one, or two times per transaction. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
🫠 |
||||||
|
||||||
All validator clients must arrive at precisely the same transaction data size | ||||||
for all transactions because a difference of one byte can determine whether a | ||||||
transaction is executed or failed, and thus affects consensus. Also, we want the | ||||||
calculated transaction data size to correspond well with the actual amount of | ||||||
data the transaction requests. | ||||||
|
||||||
Therefore, this SIMD seeks to specify an algorithm that is straightforward to | ||||||
implement in a client-agnostic way, while also accurately accounting for the | ||||||
total data required by the transaction. | ||||||
|
||||||
## New Terminology | ||||||
|
||||||
One term is defined within the scope of this SIMD: | ||||||
|
||||||
* Valid program: a program that has been loaded, or a builtin. This definition | ||||||
excludes programs that have failed verification, or LoaderV3 programs that have | ||||||
been closed or have delayed visibility due to being deployed or modified in the | ||||||
current slot. | ||||||
|
||||||
These terms are not new, however we define them for clarity: | ||||||
|
||||||
* Top-level program: the program corresponding to the program id on a given | ||||||
instruction. | ||||||
* Instruction account: an account passed to an instruction, which allows its | ||||||
program to view the actual bytes of the account. CPI can only happen through | ||||||
programs provided as instruction accounts. | ||||||
* Transaction accounts list: all accounts for the transaction, which includes | ||||||
top-level programs, the fee-payer, all instruction accounts, and any extra | ||||||
accounts added to the list but not used for any purpose. | ||||||
|
||||||
## Detailed Design | ||||||
|
||||||
The proposed algorithm is as follows: | ||||||
|
||||||
1. Every account explicitly included on the transaction accounts list is counted | ||||||
once and only once. | ||||||
2. A valid program owned by LoaderV3 also includes the size of its programdata. | ||||||
3. Other than point 2, no accounts are implicitly added to the total data size. | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. edge-case for consideration. Does it matter if the account is used as a top-level or instruction account? Realistically we should not see these if people construct txs reasonably, but if an account is:
we could reasonably choose to not load it at all? With the proposed algorithm we would still need to load it in order to find the size. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as-written it wouldnt matter, all accounts are treated the same but thats an interesting idea, i wasnt thinking about account loading with this simd. i was only focused on how to arrive at a correct total given a set of accounts. but youre right, i cant think of a reason why we should load these kinds of accounts at all, other than the fact that we already do it now if we do that (im in favor), there are several formulas that seem plausible to me Option 1a: sum the sizes of the unique set of accounts Option 1b: the same as above but programdata is only added to program size if the program is valid for execution in the current slot Option 2: sum the sizes of the unique set of accounts i think Option 1 is preferable. Option 2 fails to count cpi programs, but it sidesteps any issues with tombstoned programs because (after fee-only transactions... i say that a lot lately) the sizes dont matter, transactions that use them as program ids would fail anyway Option 1a and 1b behave the same for closed programs since there is no programdata. they differ on handling programs which fail validation and programs that are changed in-slot. this is especially germaine to simd83 since transactions will be able to mess with programs in the same batch all the options can double-count LoaderV3-owned accounts but i think this is appropriate, the compiled program and the account data if accessed in an instruction are basically their own separate things im undecided between 1a and 1b. do you have an opinion? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Difference of a and b is just with valid programs v3 program's programdata? A bit of a detail, but can we determine if it's valid or not without loading the programdata? In the general case** assuming not in cache already. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, you need programdata because once the program account is deployed it never changes 1a is probably safer than 1b on reflection, i was thinking 1b is easy because we can just check program cache tombstones (and firedancer has to have something similar to conform) but the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah 1a seems better option to me in that case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed on 1a. And I think I agree that it's best to double count programdata accounts if they are included in the ix account list since the SVM would effectively keep 2 copies in memory anyways. |
||||||
Transactions may include a | ||||||
`ComputeBudgetInstruction::SetLoadedAccountsDataSizeLimit` instruction to define | ||||||
a data size limit for the transaction. Otherwise, the default limit is 64MiB | ||||||
(`64 * 1024 * 1024` bytes). | ||||||
|
||||||
If a transaction exceeds its data size limit, account loading is aborted and the | ||||||
transaction is failed. Fees will be charged once | ||||||
`enable_transaction_loading_failure_fees` is enabled. | ||||||
|
||||||
Adding required loaders to transaction data size is abolished. They are treated | ||||||
the same as any other account: counted if on the transaction accounts list, not | ||||||
counted otherwise. | ||||||
|
||||||
Read-only and writable accounts are treated the same. In the future, when direct | ||||||
mapping is enabled, this SIMD may be amended to count them differently. | ||||||
|
||||||
As a consequence of 1 and 2, for LoaderV3 programs, programdata is counted twice | ||||||
if a transaction includes both programdata and the program account itself in the | ||||||
accounts list, unless the program is not valid for execution. This is partly | ||||||
done for ease of implementation: we always want to count programdata when the | ||||||
program is included, and there is no reason for any transaction to include both | ||||||
accounts except during initial deployment, in which case the program is not yet | ||||||
valid. | ||||||
|
||||||
We include programdata size in program size for LoaderV3 programs because in | ||||||
nearly all cases a transaction will include the program account (the only way to | ||||||
invoke the program) and will not include the programdata account because | ||||||
including it serves no purpose. Including the program account forces an | ||||||
unconditional load of the programdata account because it is required to compile | ||||||
the program for execution. Therefore we always count it, even when the program | ||||||
is an instruction account. | ||||||
|
||||||
There is no special handling for programs owned by the native loader, LoaderV1, | ||||||
or LoaderV2. | ||||||
|
||||||
Account size for programs owned by LoaderV4 is left undefined. This SIMD should | ||||||
be amended to define the required semantics before LoaderV4 is enabled on any | ||||||
network. | ||||||
|
||||||
## Alternatives Considered | ||||||
|
||||||
* Transaction data size accounting is already enabled, so the null option is to | ||||||
enshrine the current Agave behavior in the protocol. This is undesirable because | ||||||
the current behavior is highly idiosyncratic, and LoaderV3 program sizes are | ||||||
routinely undercounted. | ||||||
* Builtin programs are backed by accounts that only contain the program name as | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's also in the works to make (some/most) builtins just normal programs, so adding any special cases here would probably be nullified in the future anyway |
||||||
a string, typically making them 15-40 bytes. We could make them free when not | ||||||
instruction accounts, since they're part of the validator. However this | ||||||
adds complexity for no real benefit. | ||||||
|
||||||
## Impact | ||||||
|
||||||
The primary impact is this SIMD makes correctly implementing transaction data | ||||||
size accounting much easier for other validator clients. | ||||||
|
||||||
It makes transactions which include program accounts for CPI somewhat larger, | ||||||
but given the generous 64MiB limit, it is unlikely that any existing users will | ||||||
be affected. | ||||||
|
||||||
## Security Considerations | ||||||
|
||||||
Security impact is minimal because this SIMD merely simplifies an existing | ||||||
feature. | ||||||
|
||||||
This SIMD requires a feature gate. | ||||||
|
||||||
## Backwards Compatibility | ||||||
|
||||||
Transactions that currently have a total transaction data size close to the | ||||||
64MiB limit, which call LoaderV3 programs via CPI, may now exceed it and fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This title reads as if the proposal specifies the size of a serialized transaction rather than size of the account data loaded for processing a transaction. How about
Transaction Account Data Size Specification
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
went with
Loaded Transaction Data Size Specification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
went with
Loaded Transaction Data Size Specification