Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

block writeable account data limit #184

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bw-solana
Copy link

No description provided.

@jacobcreech
Copy link
Contributor

@bw-solana do we know the current impact if this change was live the past 30 days?

@bw-solana
Copy link
Author

@bw-solana do we know the current impact if this change was live the past 30 days?

Good question, I don't know. I'll get some data

transaction being handled by the block producer (for inclusion in a block) or
validator (replaying transactions in a block), add it to a running total of
total writeable account data for the block, and compare it against some
threshold (recommend 2GB to start) to decide if the transaction is allowed to be

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have statistics on what is the current distribution of writable data on the mainnet?
I wonder if we could start with an even lower value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have perfect stats, but I posted comment with some analysis below (#184 (comment))

2GiB feels right as a starting point. It should constrain worst case reasonably well without impacting current typical behavior on mainnet. We could ratchet things down (or up) after observing behavior

Comment on lines +65 to +66
`MAX_BLOCK_ACCOUNTS_DATA_SIZE_WRITEABLE` and set this to `2_000_000_000`
(2GB).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2GB is 2_147_483_648 :P

Copy link
Member

@2501babe 2501babe Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`MAX_BLOCK_ACCOUNTS_DATA_SIZE_WRITEABLE` and set this to `2_000_000_000`
(2GB).
`MAX_BLOCK_ACCOUNTS_DATA_SIZE_WRITEABLE` and set this to 2GiB
(`2 * 1024 * 1024 * 1024`).

Comment on lines +69 to +70
5. Treat this new error type (`WouldExceedAccountDataWriteableBlockLimit`) as
retryable in execute_and_commit_transactions_locked

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
5. Treat this new error type (`WouldExceedAccountDataWriteableBlockLimit`) as
retryable in execute_and_commit_transactions_locked
5. Treat this new error type (`WouldExceedAccountDataWriteableBlockLimit`) as
retryable in `execute_and_commit_transactions_locked()`.

Comment on lines +72 to +76
There is an edge case where if a single transaction tries to consume more than
the entire per block writeable account data budget, it should not be marked as
retryable. This might not be possible today given tx size limits and maximum
account size limits, but this could be exploitable in the future if either of
this constraints change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to introduce a separate constraint that limits the amount of writable data a single transaction can access?
We have MAX_LOADED_ACCOUNTS_DATA_SIZE_BYTES equal to 64MB which limits the total amount of data a transaction can load.
We could add MAX_LOADED_ACCOUNTS_WRITEABLE_DATA_SIZE_BYTES (with an obvious semantics).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. We'll effectively be constrained to the 64MiB per tx for updated account data and 100MiB per block for newly allocated account data, so we already have some guard rails here, but we could be more explicit

Comment on lines +56 to +59
1. Update the cost model to compute the aggregate size of all accounts marked
writeable for each transaction. This can be done similarly to how
`data_bytes_len_total` is computed inside `get_transaction_cost` but limited
to accounts marked writeable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is inaccurate.

Looking at the get_transaction_cost() in cost-model, I think, data_bytes_len_total only calculates the size of the instruction data passed into instructions.
It does not look at the account sizes:

            data_bytes_len_total =
                data_bytes_len_total.saturating_add(instruction.data.len() as u64);

https://github.com/anza-xyz/agave/blob/f9f8b60ca15fa721c6cdd816c99dfd4e9123fd77/cost-model/src/cost_model.rs#L187-L188

The loaded accounts data size is only included in the transaction cost after the transaction execution.
Actual value is passed into the calculate_cost_for_executed_transaction() as the actual_loaded_accounts_data_size_bytes argument.
And it is populated and checked in the blockstore_processor.rs:

                Some(CostModel::calculate_cost_for_executed_transaction(
                    tx,
                    committed_tx.executed_units,
                    committed_tx.loaded_account_stats.loaded_accounts_data_size,
                    &bank.feature_set,
                ))

https://github.com/anza-xyz/agave/blob/f9f8b60ca15fa721c6cdd816c99dfd4e9123fd77/ledger/src/blockstore_processor.rs#L246-L251

We could record have a similar counter for the amount of writable data that the transaction touched.

So at the high level it can be computed similarly to what we are already doing.
It is just that the details would be different.

Not sure if there are any implications for the DX, due to the fact that we can only compute this value after a transaction execution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right. We don't have any precedent for understanding actual account data before loading/executing transactions. We rely on either scraping top level instructions (e.g. for allocate) or estimating and then adjusting.

This would seemingly have to fall into the estimate + adjust case (unless we want to do something radical like grab account data size before execution), which would likely impact DX if we push this onto the user similar to how we ask for CU/Data budget estimates.

One option is to leverage the existing Data budget value for writes as well (defaults to 2MiB per tx or can be set by user). I think this is likely small enough of a default to not bottleneck scheduling before the data budget gets refunded post execution. We would need to study this.

Comment on lines +45 to +46
or if the slot needs to be marked dead (if sum is above the limit for the case
of validator).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

andrew will have to look at this because if it can invalidate blocks then it may interfere with asynchronous execution

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout. I would naively expect this to be handled in the same manner that we are handling account data load limits, but someone with a bigger brain should confirm the details

Copy link
Member

@2501babe 2501babe Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its unfortunately quite different, transaction data size (sum of all account data sizes required for a transaction) can be calculated on the transaction level, and if the transaction exceeds its limit, it is treated as invalid. so in a world with async execution, the transaction could be included in a block before verifying it, but all nodes will see that it violates its limits and treat it the same (charge fees and not execute it, once the feature gate for fee-only transactions is active)

but if we added a data size limit at the block level, you would need to know the running total of all previous transactions to determine whether you can put the transaction in the block. theres no way to do this without executing because accounts can be created, deallocated, or reallocated. the purpose of removing constraints to enable async execution is that block producers which put "bad" transactions in a block might make suboptimal blocks, but they can always be sure the block is valid if they follow constraints that dont depend on execution (are transactions syntactically valid, are blockhashes valid, etc)

i believe we could have a block-level constraint as proposed here if instead of "if you exceed the block data size limit, the block is invalid" it was like "if you exceed the block data size limit, no further transactions in the block will be executed." this could be user-hostile tho, because it lets block producers invalidate transactions that should otherwise succeed. especially if we charge fees, the block producer now has an incentive to exceed the block limit and farm fees without having to execute transactions. this is different from a transaction exceeding its own data size because in that case the transaction is invalid as constructed on its own

alternatively we could very easily charge more for writable data at the transaction level, but im not sure that accomplishes the goals of this simd. andrew may have better suggestions than me tho

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if you exceed the block data size limit, no further transactions in the block will be executed."

I like the direction of this, but I believe it leads to divergence, as transaction replay order is not guaranteed (iiuc). I tried to do something related a while ago, and wrote up my findings here: solana-labs/solana#27029

Copy link
Author

@bw-solana bw-solana Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation!

IIUC, the key difference between the strategy for loaded account data vs written account data is in how we know/estimate the amounts for a given tx.

We can know how much data a tx loads by either: (1) inspecting the tx and loading the accounts or (2) using the data budget requested / default data budget. The latter obviously is prone to under-packing, but with the current rates (48M CUs per block, default data budget of 2MiB, charging 8 CUs per 32kB), we could still pack >90k txs per block using default data budget.

For understanding the amount of data written, we either (1) actually execute the tx or (2) rely on some data written budget requested. Method 1 is a showstopper for async execution, so it seems like 2 is our only option. A default budget of 2MiB would allow only 1k default budget txs with a 2GiB write limit. Maybe we should do some profiling to see how low we could reasonably push this limit without impacting most existing txs.

Copy link
Member

@2501babe 2501babe Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, think we are on the same page! the way transaction data size works now is theres a default budget (64MiB), adjustable by compute budget instruction, and at execution time the accounts are totalled and the transaction is invalid if it exceeds its limit. but this doesnt affect the block, so it doesnt interfere with async execution

we could add a transaction-level writable accounts budget with a smaller default limit (assume 2MiB like you suggest), and it would likewise be async-safe. we could easily have block-level constraints based on those requested budgets, which would also be async-safe. the main risk is people requesting enormous budgets and crowding other transactions out of the block, which we could defray with either a per-transaction maximum limit** or by exponentially ramping the cost per byte

if you make changes to the transaction data size calculation code, note that the formula is quite tricky. i have an open simd to fix this #186

**(16-20MiB would allow writing to a 10Mib account plus plenty of breathing room, but i dont actually know how big the largest accounts are onchain. 10MiB is the max allocation size but i believe you can pump accounts larger than that with realloc)

@bw-solana
Copy link
Author

TL;DR - I think we would see essentially zero impact to mainnet behavior with a 2GiB write limit per block.

We don't have existing metrics for writeable account data per slot, but I was able to collect some relevant data that should help this conversation. Data is looking back at metrics over the past 2+ weeks.

We track the amount of account data stored per second (accounts_db_store_timings.total_data). Note this does not necessarily mean writing to disk (may go to cache). This should roughly map to 2 slots worth of writeable data or at least be a pessimistic upper bound. Note this encompasses writes beyond the user transactions included in a block such as stake account updates and fee collection.

I am observing writes typically in the 100MiB range with spikes as high as ~1.9GiB. We'll see spikes over 1GiB ~a few times per day.

We also track newly allocated account data per block (cost_tracker_stats.allocated_accounts_data_size). This is typically a handful of kBs on average with spikes up to the 4.5MiB range.

I also added some metrics to track the largest append vec writes. Most of these are in the 1-2MiB range, but we see spikes up to 400MiB on epoch boundary due to saving stake accounts. These spikes will come down significantly and be spread out when PER is enabled. I've also observed non-epoch-boundary spikes up to the ~20MiB range.

@brooksprumo - It would be helpful if you or someone could fact check me here

Copy link
Contributor

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having block-wide limits. I attempted something similar a while ago, when (iirc) we did not, and thus had to hijack transaction errors to indicate block-wide issues. That model didn't work, and I wrote about it here: solana-labs/solana#27029. I've heard the TVU is smarter now, and can accommodate block-wide limit, which is great!

Comment on lines +45 to +46
or if the slot needs to be marked dead (if sum is above the limit for the case
of validator).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if you exceed the block data size limit, no further transactions in the block will be executed."

I like the direction of this, but I believe it leads to divergence, as transaction replay order is not guaranteed (iiuc). I tried to do something related a while ago, and wrote up my findings here: solana-labs/solana#27029

block being produced + the amount of account data that would be written with
the current transaction against the total block limit in the would_fit
function
3. Store the write limit in a constant such as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Store the write limit in a constant

From past experience, I'd recommend against a constant that we intend to change later (or minimally want to support allowing to be changed). IMO a function that takes in the feature set and returns the value is more future-proof.

But, this is pretty agave-implementation specific, so may not be germane to a SIMD.

@brooksprumo
Copy link
Contributor

We track the amount of account data stored per second (accounts_db_store_timings.total_data). Note this does not necessarily mean writing to disk (may go to cache). This should roughly map to 2 slots worth of writeable data or at least be a pessimistic upper bound. Note this encompasses writes beyond the user transactions included in a block such as stake account updates and fee collection.

I worked on adding a metric for the amount of data stored by transaction processing, but it was not merged: solana-labs/solana#34718. Maybe we can bring this back, or use it to gather data more specific to transaction processing.

I also added some metrics to track the largest append vec writes. Most of these are in the 1-2MiB range, but we see spikes up to 400MiB on epoch boundary due to saving stake accounts. These spikes will come down significantly and be spread out when PER is enabled. I've also observed non-epoch-boundary spikes up to the ~20MiB range.

I don't have these numbers off hand; I'll look them up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants