Measure accounts loading and storing cost #31411

tao-stones · 2023-04-28T21:34:17Z

Problem

Need to measure accounts loading and storing performance in us, and in CU.

Motivation is to set transaction accounts read/write lock cost, which can be used as input for base fee calculation.

Few parameters for measuring: type of accounts (eg program account, data account and PDA), size of accounts,

Proposed Solution

using ledger-tool to replay mainnet-beta ledger, collect measure_us for accounts load and store at different circumstances.
Note: loading program accounts should be more expensive than loading normal data and PDA accounts.

The text was updated successfully, but these errors were encountered:

KirillLykov · 2023-10-24T20:28:39Z

Regarding the way CU per tx could be computed, I'm thinking about using empirical approach.
In this particular case, as input we have:

number of accounts n,
size of each account s.

The observable function of choice is time per tx, call it t(s, n).

This function is different from cost per transaction, lets denote by c(s, n).
But if the idea of 48M CU per block is to fit into 400ms block construction time, we can say that c(s, n) = 48M/0.4 [CU/sec] * t(s, n), where in the [] I specify units.

We can do simulations and find triples (s_i, n_i) -> t_i and with this data estimate parameters of function t which I think can be described by t(s, n) = a[sec]*n + b[sec/mb]*n*s.

a is "time cost" of write lock of an account (assuming it doesn't depend on the size),
b is "time cost" of writing 1MB of data.

So all we need is to find estimations for a and b.

UPD: c(s, n) = 48M/0.4/4 [CU/sec] * t(s, n) = 30[CU/sec] * t(s,n), because there are 4 threads

tao-stones · 2023-10-24T20:41:08Z

But if the idea of 48M CU per block is to fit into 400ms block construction time, we can say that c(s, n) = 48M/0.4 [CU/sec] * t(s, n), where in the [] I specify units.

Assuming there are 4 parallel processes during replay, currently the CU per nano-sec is defined as

/// Cluster averaged compute unit to micro-sec conversion rate
pub const COMPUTE_UNIT_TO_US_RATIO: u64 = 30;

so equation can be simplified to c(s, n) = 30[CU/us] * t(s, n)

tao-stones · 2023-10-24T20:52:40Z

function t which I think can be described by t(s, n) = a[sec]*n + b[sec/mb]*n*s.

a is "time cost" of write lock of an account (assuming it doesn't depend on the size),

b is "time cost" of writing 1MB of data.

Few points I am not entirely clear:

are there "time cost" of loading accounts? Especially loading cold account?
what does "time cost of write lock" mean exactly, is it the wall clock of acquiring write lock on account (which should be cheap)? or includes waiting time till the account becomes available for write lock?
will there be similar "time cost" of read lock?

ig I'm thinking alone the line of:
cost of read-only account = cost of loading
cost of write-lock account = cost of loading + cost of write-back (eg storing).
Not clear how to define "cost of read/write lock" yet, but if we do, they shall be added to above accordingly.

KirillLykov · 2023-10-26T13:25:30Z

are there "time cost" of loading accounts? Especially loading cold account?

To be honest, I didn't dig much into this thing. What I observe is that loading time reported here is quite high (1/2 of execute time). Didn't consider that there is a cache for loaded accounts. Need to check the size of the cache to say if my accounts are hot or not.

what does "time cost of write lock" mean exactly, is it the wall clock of acquiring write lock on account (which should be cheap)? or includes waiting time till the account becomes available for write lock?

will there be similar "time cost" of read lock?

I though about artificial scenario with minimized number of lock contention, so just acquiring the lock.

KirillLykov · 2023-10-26T13:38:27Z

In regard to this, I'm not sure that my mental model for the accounts load is accurate. I have in mind something like:

for each account participating in the transaction, try locking it
copy the accounts data from accountsdb to some cache
during execution of tx, fetch account from this cache if requested and copy it.
write to the account and write it back though the cache to the accountsdb

So from (3), if we compare program that only reads accounts and one that writes, the load phase of both should be the same but the execution will be longer for write program because, I think, the time to copy modified accounts data is part of execution.

@alessandrod could you tell me if it fits the reality?

KirillLykov · 2023-10-26T13:45:12Z

ig I'm thinking alone the line of:
cost of read-only account = cost of loading
cost of write-lock account = cost of loading + cost of write-back (eg storing).
Not clear how to define "cost of read/write lock" yet, but if we do, they shall be added to above accordingly.

Well, if we say that lock time is negligible, cost of write-lock account = cost of loading + cost of write-back makes sense. From the simulation point of view, cost of read-only account and cost of write-lock account might be both estimated empirically

tao-stones · 2023-10-27T03:04:55Z

From the simulation point of view, cost of read-only account and cost of write-lock account might be both estimated empirically

I played a bit on timing on storing accounts. Somewhat surprised to notice that same transaction took very different time to store accounts between ledger replays. The first replay took about 14ms to store accounts for a transaction, the second (re)run took only 1ms. Not sure why, @brooksprumo do you have a pointer to where should I look? Also, I thought you did accounts loading/storing performance measure, are there something we can use here?

brooksprumo · 2023-10-27T14:22:10Z

I played a bit on timing on storing accounts. Somewhat surprised to notice that same transaction took very different time to store accounts between ledger replays. The first replay took about 14ms to store accounts for a transaction, the second (re)run took only 1ms. Not sure why, @brooksprumo do you have a pointer to where should I look?

Let me take a look. The first transaction of a slot does some extra stuff to initialize various data structures. For example, the accounts write cache has a map from slot to accounts. So the first store to a new slot will add a new entry to the map for that new slot with an initially-empty set of accounts. But I don't think this alone would account for over 10 milliseconds of difference...

Also, I thought you did accounts loading/storing performance measure, are there something we can use here?

I did some histograms for loading accounts, but not on storing accounts. So I don't think that'll apply here (based on this and other threads, seems that writes/stores are the issue).

brooksprumo · 2023-10-27T15:15:43Z

Let me take a look.

Nothing jumps out in the code. I'll look at metrics to see if that elucidates anything.

tao-stones · 2023-10-27T19:02:54Z

I did some histograms for loading accounts, but not on storing accounts. So I don't think that'll apply here (based on this and other threads, seems that writes/stores are the issue).

Data on loading is helpful too.

The overarching idea is loading/storing/locking accounts, as part of execution, are underpriced, so it leads to either few transactions are packing into block, or elevate replay time. @KirillLykov correct me if i'm wrong.

This issue aims to identify following:

micro-second/10k-bytes for loading, on average
micro-second/10k-bytes for storing, on average
and micro-second/locking, both read & write.
And, if above metrics should be different based on account types (eg app account, data account, pda)

brooksprumo · 2023-10-27T19:13:59Z

Data on loading is helpful too.

Gotcha. Maybe this PR can be helpful to start with: #33220

tao-stones · 2023-11-21T19:25:00Z

@KirillLykov to follow up on our call, my understanding of "loading account", which could be wrong, consists two steps:

load account from db to read-only-cache.
serialize and memcpy into VM, so it's available for program;
first step happens at accounts_db.load(), Brooks PR above reports this operation's time, it is also folded into metrics replay-slot-end-to-end-stats.load_us
I am not clear about second step, maybe it is part of "execution"?

As for cost of "storing account", I assume is a reverse process: deserialize and memcpy from VM to accounts' write-cache, then store to accounts-db here. metrics replay-slot-end-to-end-stats.store_us captures it.

alessandrod · 2023-11-21T19:58:04Z

¡

@KirillLykov to follow up on our call, my understanding of "loading account", which could be wrong, consists two steps:
1. load account from db to read-only-cache.

2. serialize and memcpy into VM, so it's available for program;
   first step happens at [accounts_db.load()](https://github.com/solana-labs/solana/blob/master/accounts-db/src/accounts_db.rs#L5032), Brooks PR above reports this operation's time, it is also folded into metrics `replay-slot-end-to-end-stats.load_us`
   I am not clear about second step, maybe it is part of "execution"?
As for cost of "storing account", I assume is a reverse process: deserialize and memcpy from VM to accounts' write-cache, then store to accounts-db here. metrics replay-slot-end-to-end-stats.store_us captures it.

Yep this is correct, the serialize* and deserialize* execution metrics track the time to write accounts into the vm and then parse them back respectively.

KirillLykov · 2023-11-28T16:28:40Z

@taozhu-chicago my understanding of the metrics is the following:

Cost of load:

Cost of read from AccountsDB

checks in write and later read accounts caches. So we should observe spike in this metric when accounts are first used. Need to find the size of caches and if there is a metric for cache hit.
metric is replay-slot-end-to-end-stats.load_us

Cost of serialization

happens before VM executes the program
includes cost of copying accounts data into the buffer used by VM see write_data
metric is replay-slot-end-to-end-stats.execute_details_serialize_us

Cost of execution

metric for time to execute instruction without serialization/deserialization is replay-slot-end-to-end-stats.execute_details_execute_inner_us
But there also other metrics which include this one. In the order of call stack:
execute_acessories_process_message -> execute_accessories_process_instruction_total_us -> execUte_acessories_process_instruction_executable_chain_us -> execute_details_execute_inner_us

Cost of deserialization

happens after execution
includes cost of copying accounts data
metric is replay-slot-end-to-end-stats.execute_details_deserialize_us

Cost of storing into AccountsDB

happens in banks.rs::commit_transaction
metric is replay-slot-end-to-end-stats.store_us
there are more detailed metrics in accounts_db_store_timings: store_accounts, update_index and other.

NOTE: replay-slot-end-to-end-stats metrics are for longest running replay threads, while replay-slot-stats metrics are for sum of all the executions.

tao-stones · 2023-11-29T04:11:35Z

Amazing! perhaps you can also add replay-slot-end-to-end-stats.store_us as cost of storing?

KirillLykov · 2023-11-30T14:25:50Z

Some preliminary results for the run when I create blocks of transactions such that each transaction modifies 1 10MB account. There are 256 accounts in total modified in round-robing manner. The unit is us everywhere, I double checked the code and it is really always as_us().

Metric name	Value
replay-slot-end-to-end.load_us	79.5
replay-slot-end-to-end.execute_us	69.64K
execute_accessories_process_message	69.59K
process_instructions_total_us	69.57K
replay-slot-end-to-end.execute_accessories_process_instructions_process_executable_chain_us	69.55K
replay-slot-end-to-end.execute_details_execute_inner_us	69.17K
replay-slot-end-to-end.execute_details_serialize_us/deserialize_us	18.23/5.9
replay-slot-end-to-end.store_us	102.7

Preliminary because I just picked time for 2s when the invalidator was a leader, it is not averaged over long enough period of time (because for that I need to develop some tools).

What is weird with it:

load/store time is so small. Also expected that accounts_db_store_timings.store_accounts is time to store accounts in accounts_db should be part of the replay-slot-end-to-end.store_us but it is ~600K so seems i don't quite understand what is this metric for.
I would expected that load+execute+store will be around 400ms because I thought that we are bounded here by the time to execute transactions. Otherwise, why would we form blocks made of only 35txs?

alessandrod · 2023-11-30T20:03:00Z

Otherwise, why would we form blocks made of only 35txs?

Because the scheduler will estimate CU costs, see high costs, and pack a block with enough transactions as not to exceed the max block CU cost. (EDIT: I'm assuming that the CU cost for these txs is high)

tao-stones · 2023-11-30T21:49:17Z

Otherwise, why would we form blocks made of only 35txs?

Because the scheduler will estimate CU costs, see high costs, and pack a block with enough transactions as not to exceed the max block CU cost. (EDIT: I'm assuming that the CU cost for these txs is high)

Mmm, possible but not likely. Scheduler uses actual consumed CU when packing, not the amount TX requests. Given only 25 TXs/block, each tx need to have total actual CU = 48,000,000/35 = 1.4M CU/tx, which is the up limit. I'll look into ledger once it's available.

tao-stones mentioned this issue Apr 28, 2023

Define detailed costs for accounts #29582

Closed

14 tasks

tao-stones self-assigned this Apr 28, 2023

tao-stones changed the title ~~measure different accounts loading and storing performance~~ measure different type of accounts' loading and storing performance Apr 28, 2023

tao-stones changed the title ~~measure different type of accounts' loading and storing performance~~ measure loading and storing performance for different account types Apr 28, 2023

tao-stones changed the title ~~measure loading and storing performance for different account types~~ Measure accounts loading and storing costa Oct 24, 2023

tao-stones changed the title ~~Measure accounts loading and storing costa~~ Measure accounts loading and storing cost Oct 24, 2023

tao-stones mentioned this issue Nov 22, 2023

Scheduler adjusts underpriced transaction consumed CUs #33914

Closed

tao-stones mentioned this issue Jul 17, 2024

CostModel: Needs more CUs for read/write accounts? anza-xyz/agave#2176

Open

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Dec 2, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure accounts loading and storing cost #31411

Measure accounts loading and storing cost #31411

tao-stones commented Apr 28, 2023 •

edited

Loading

KirillLykov commented Oct 24, 2023 •

edited

Loading

tao-stones commented Oct 24, 2023

tao-stones commented Oct 24, 2023

KirillLykov commented Oct 26, 2023

KirillLykov commented Oct 26, 2023

KirillLykov commented Oct 26, 2023

tao-stones commented Oct 27, 2023

brooksprumo commented Oct 27, 2023

brooksprumo commented Oct 27, 2023

tao-stones commented Oct 27, 2023 •

edited

Loading

brooksprumo commented Oct 27, 2023

tao-stones commented Nov 21, 2023

alessandrod commented Nov 21, 2023

KirillLykov commented Nov 28, 2023 •

edited

Loading

tao-stones commented Nov 29, 2023

KirillLykov commented Nov 30, 2023 •

edited

Loading

alessandrod commented Nov 30, 2023 •

edited

Loading

tao-stones commented Nov 30, 2023

Measure accounts loading and storing cost #31411

Measure accounts loading and storing cost #31411

Comments

tao-stones commented Apr 28, 2023 • edited Loading

Problem

Proposed Solution

KirillLykov commented Oct 24, 2023 • edited Loading

tao-stones commented Oct 24, 2023

tao-stones commented Oct 24, 2023

KirillLykov commented Oct 26, 2023

KirillLykov commented Oct 26, 2023

KirillLykov commented Oct 26, 2023

tao-stones commented Oct 27, 2023

brooksprumo commented Oct 27, 2023

brooksprumo commented Oct 27, 2023

tao-stones commented Oct 27, 2023 • edited Loading

brooksprumo commented Oct 27, 2023

tao-stones commented Nov 21, 2023

alessandrod commented Nov 21, 2023

KirillLykov commented Nov 28, 2023 • edited Loading

tao-stones commented Nov 29, 2023

KirillLykov commented Nov 30, 2023 • edited Loading

alessandrod commented Nov 30, 2023 • edited Loading

tao-stones commented Nov 30, 2023

tao-stones commented Apr 28, 2023 •

edited

Loading

KirillLykov commented Oct 24, 2023 •

edited

Loading

tao-stones commented Oct 27, 2023 •

edited

Loading

KirillLykov commented Nov 28, 2023 •

edited

Loading

KirillLykov commented Nov 30, 2023 •

edited

Loading

alessandrod commented Nov 30, 2023 •

edited

Loading