-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: db size increase too fast #451
Comments
For reference:
|
Remove tx_index.dbCurrently we rely on tx indexer to query tx by eth tx hash, an alternative solution is to store that index in a standalone kv db in app side, so we don't need to retain all the tx indexes. |
RocksDB uses |
yap, we should consider using a new kvstore just for storing the tx hash mapping. Also we can disable the Tendermint indexer for increasing the consensus performance. |
you mean nodes could choose not to have this tx_index.db by moving this part off-chain? |
yes, by store the eth tx hash index in another place. |
I will start a testing build with the custom RocksDB setup to see how good it can be improved |
There's an option in The minimal one for json-rpc to work should be:
EDIT: |
Which block height was observed in this DB scale? |
Looks like Wait until the testing node fully syncs up to the network and see the final result. |
@tomtau mentioned we could do some statistics on the |
BTW, this is
Compared to full archive one:
|
the |
Got the testing node synced up to the plan upgrade height,
using
It meets the benchmark in this article. There is no gain from the compression ratio, only gains from the compression/decompression speed |
why is state.db larger in the pruned one? (120GB vs 90GB) |
Went through the evm |
Another thing related is, in |
it feels that ibc shouldn't store so many pairs, can you see the prefixes? |
the major Key patterns in ibc store:
|
I guess some historical (i.e. older than "evidence age") states, acks, receipts... could be pruned from ibc application storage? |
https://github.com/cosmos/ibc-go/blob/release/v2.2.x/modules/light-clients/07-tendermint/types/update.go#L137 the sequence keys don't seem prune at all. |
working on it, |
The EVM module's storage schema is much simpler, contract code and storage, and the storage slots are calculated by evm internally, I guess there's not much to prune there. |
it's the storage slot number, computed by evm internal. |
in the storage part, the address |
Iterating the db with all orphans: 3,394,612,961,totalKeySize: 166G, totalValueSize: 108.6G
From https://geth.ethereum.org/docs/dapp/tracing Maybe we can define the proper state pruning interval to make sure the tx re-executing is not too heavy to the node? |
Not sure what the keys are like, but if the slot keys, bucketing or
@JayT106 you can try syncing with a different config and consult with @CeruleanAtMonaco @jun0tpyrc if that config will still be all right for dApps or exchanges that need a full or archival node-like setting.
|
I think the comments on the
meaning the app will keep the latest 362880 versions (around 21 days by 5 secs block time), and then only keep 1 version for every 100 blocks past the Also, the settings are only affecting the current DB, meaning it will not prune the past version far over the |
I think it's the slot keys, the past states (changed or removed) relates to the contract addresses. |
Does that mean a more aggressive pruning setting can drop the db size a lot? |
the validator node should be able to use prune But a problem will be they cannot be a reference node of the snapshot. I will ask Devops to set up a testing node to see the result of different pruning setups. |
What is the significance for a validator to be a ref. node of the snapshot? If we are pushing validator to its max storage efficiency I think they don't need to be that? |
If our validator doesn't need to be a ref. node of the snapshot. Then yes, we can prune everything. But the outside validators will need to know the difference in setting up the different pruning options (perhaps they use it for a ref. node). |
@tomtau However, I agree with you we can test it with the new SDK to see the impact. |
Possibly, there are two things to verify:
|
currently they are refactoring evmos to test on it. |
Can we can keep the snapshot interval identical to pruning interval, so we can keep snapshot working while maximize pruning. |
Sounds like a good idea, but need to do in cosmos-sdk level: cosmos/cosmos-sdk#12183 |
FYI, @garyor has synced a pruned node which only keep the recent 50 versions, the db size is:
noticeably, |
if we want more aggressive, we can set I am working on the storage migrating from V1 to V2 and calculating the size. |
To migrate from store V1 to V2 might have a big problem. We have 24M+ kv pairs in the latest version (2933002) of my testing evm module. It took 4 days and only migrated ~1/3 kv pairs (still ongoing). However, I tested a small amount of dataset to evaluate the DB size between storeV1 and storeV2, looks like it helps when we ignore the rocksDB's I will try to scale the simulation to see the different numbers. But we might need to dig the SMT implementation to see why the migration takes so long? @adu-crypto any idea? |
But using v2 store without creating checkpoints for every version don't support query historical state, right? |
@JayT106 how about with go-leveldb? with that SC/SS separation, maybe go-leveldb still may be viable for RPC nodes? |
That sounds right, but I don't understand why checkpoints took so many disk spaces, could it be possible some implementation issue in it or my test setup is incorrect?
The SDK v0.46.x didn't implement the go-leveldb for supporting the v2 store. |
@JayT106 currently, smt has a slow down of more than an order of mgnitude speaking of write performance. |
So the main issue I think is the IAVL+ tree design. Say the EVM module storage at the current cronos scale, the tree height is around 25 at the recent version. Each leaf modification will cause 24 intermediate nodes to be updated, therefore, the previous version of the intermediate nodes will keep in the database, and will have new 24 intermediate nodes for the new version of the EVM store. So we can say the writing amplification for each tree operation is related to the height of the tree. It is very different from the Ethereum trie implementation. Because the SMT is still not mature. Should we proceed with some workaround to downscale the tree height: Or we should look at the Cosmos SDK new SMT implementation to see anything we can contribute and then might be able to use it earlier? |
Yes, I think fixing the issues in the SMT implementation is the way to proceed (plus also trying how the SMT implementation could be best leveraged in the Ethermint context, see the "custom" usage note: #451 (comment) to mimic go-ethereum). Two related notes / issues: |
Several updates:
|
evmos/ethermint#1121 (comment) The tx indexer db size reduction is very good with the custom eth tx indexer. |
investigate to see if there are low hanging fruits to reduce the db size.
The text was updated successfully, but these errors were encountered: