-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR-040: Storage and SMT State Commitments #8430
Changes from 6 commits
11728cf
662ec91
5fdbe5d
fa8e9e3
864927e
78215b2
6dd0323
250b5ff
374916f
e90bf8a
8602b3e
ca39df5
aedce21
f704279
06d1952
7537c84
1cc123e
d321dac
80d0122
962a28b
bb89798
19d2126
356f987
42e7f08
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
# ADR 040: Storage and SMT State Commitments | ||
|
||
## Changelog | ||
|
||
- 2020-01-15: Draft | ||
|
||
## Status | ||
|
||
DRAFT Not Implemented | ||
|
||
|
||
## Abstract | ||
|
||
Sparse Merke Tree (SMT) is a version of a Merkle Tree with various storage and performance optimizations. This ADR defines a separation of state commitments from data storage and the SDK transition from IAVL to SMT. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
## Context | ||
|
||
Currently, Cosmos SDK uses IAVL for both state commitments and data storage. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would define what state commitments are and how it differs from data storage. It can be concise. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't it self explaining? State commitment is a commitment to a state. I can add a link to explain more general commitment schemes. |
||
|
||
IAVL has effectively become an orphaned project within the Cosmos ecosystem and it's proven to be an inefficient state commitment. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
In the current design, IAVL is used for both data storage and as a Merkle Tree for state commitments. IAVL is meant to be a standalone Merkelized key/value database, however it's using a KV DB engine to store all tree nodes. So, each node is stored in a separate record in the KV DB. This causes many inefficiencies and problems: | ||
|
||
+ Each object select requires a tree traversal from the root | ||
+ Each edge traversal requires a DB query (nodes are not stored in a memory) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you sure about this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when traversing, we a tree we are always doing a DB query. However subsequent queries are cached on SDK level, not the IAVL level. I can add that calcification. |
||
+ Creating snapshots is [expensive](https://github.com/cosmos/cosmos-sdk/issues/7215#issuecomment-684804950). It takes about 30 seconds to export less than 100 MB of state (as of March 2020). | ||
+ Updates in IAVL may trigger tree reorganization and possible O(log(n)) hashes re-computation, which can become a CPU bottleneck. | ||
+ The leaf structure is pretty expensive: it contains the `(key, value)` pair, additional metadata such as height, version. The entire node is hashed, and that hash is used as the key in the underlying database, [ref](https://github.com/cosmos/iavl/blob/master/docs/node/node.md | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please elaborate on why it's "expensive". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It contains lot of data, which is not needed in the new structure. We don't really need the metadata in the new structure. |
||
). | ||
|
||
|
||
Moreover, the IAVL project lacks support and a maintainer and we already see better and well-established alternatives. Instead of optimizing the IAVL, we are looking into other solutions for both storage and state commitments. | ||
|
||
|
||
## Decision | ||
|
||
We propose separate the concerns of state commitment (**SC**), needed for consensus, and state storage (**SS**), needed for state machine. Finally we replace IAVL with [LazyLedger SMT](https://github.com/lazyledger/smt). LazyLedger SMT is based on Diem (called jellyfish) design [*] - it uses a compute-optimised SMT by replacing subtrees with only default values with a single node (same approach is used by Ethereum2 as well). | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
### Decouple state commitment from storage | ||
|
||
Separation of storage and commitment (by the SMT) will allow to optimize the different components according to their usage and access patterns. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
SMT will use it's own storage (could use the same database underneath) from the state machine store. For every `(key, value)` pair, the SMT will store `hash(key)` in a path and `hash(key, value)` in a leaf. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible for us to apply these changes to the IAVL implementation which would remove the state duplication from the implementation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of scope of the refactor. It can be done after this upgrade has been completed and if someone asks for it, otherwise we would look at archiving IAVL There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree with @marbar3778 . IAVL has other drawbacks, and no point to update it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be really great to understand better why we're storing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, in the design I put forth in #9158 and #9156, I was thinking we might store There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is what we are doing. Modules don't even know if there is a merkle tree, and what goes into the merkle tree. Modules only use a generic KVStore interface as it's done today (with caching and key prefixing). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We want to bind a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm more or less expressing my desire for two methods on The reason I mention proto JSON is because for Rather than specifying this at the framework level, my solution would be for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This ADR is not about introducing a storage for data not being part of the state commitment. The reason we have 2 data store (SS and SC) under the hood is for efficiency and was inspired by turbo geth. In other words, in this design we have only one external store, which commits and queries committed data. Under the hood, it uses 2 DBs for efficiency. Support storage (eg module off chain store) or indexers are out of the scope and are not part of the committed state. We could implement an extension store which will use the state commit store (this ADR) in some way (eg: kind of a subtree, or polynomial commitment). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It's not clear to me why storage should deal with additional logic (eg what's the data type), rather then bytes. If client want's to save data using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will add notes about off-chain store to Further Discussion section. |
||
|
||
For data access we propose 2 additional KV buckets: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is a KV bucket here? this may be nomenclature I am not familiar with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some KV databases use buckets for creating different databases under the same server / engine. Postgresql will call it databases (you can have multiple databases in single Postgresql instance). RocksDB calls it column family. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you post this link with a small explainer. The current explainer doesn't explain, it just throws a sentence into the mix |
||
1. B1: `key → value`: the principal object storage, used by a state machine, behind the SDK `KVStore` interface: provides direct access by key and allows prefix iteration (KV DB backend must support it). | ||
2. B2: `hash(key, value) → key`: an index needed to extract a value (through: B2 -> B1) having a only a Merkle Path. Recall that SMT will store `hash(key, value)` in it's leafs. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the need for the reverse index? I'm just wondering what the use case is. I'm imagining mostly we will have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we don't have it then you will always need to know a |
||
3. we could use more buckets to optimize the app usage if needed. | ||
|
||
Above, we propose to use KV DB. However, for state machine we could use RDBMS, which we discuss below. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
### Requirements | ||
|
||
State Storage requirements: | ||
+ range queries | ||
+ quick (key, value) access | ||
+ creating a snapshot | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
+ prunning (garbage collection) | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
State Commitment requirements: | ||
+ fast updates | ||
+ path length should be short | ||
+ creating a snapshot | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
+ pruning (garbage collection) | ||
|
||
|
||
### LazyLedger SMT for State Commitment | ||
|
||
A Sparse Merkle tree is based on the idea of a complete Merkle tree of an intractable size. The assumption here is that as the size of the tree is intractable, there would only be a few leaf nodes with valid data blocks relative to the tree size, rendering the tree as sparse. | ||
|
||
|
||
### Snapshots | ||
|
||
One of the Stargate core features are snapshots and fast sync. Currently this feature is implemented through IAVL. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't see a way to use snapshots/versions with BoltDB API. BoltDB provides "snapshot isolation", but do not support explicit creation/usage of snapshots. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good point. We should double-check that the dbs mentioned below actually provide the desired snapshotting mechanism. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for checking. I'm not completely sure. We can check with the hashicorp team. I found this:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if we should spend a time here to discuss the backend. I mean - we need to verify the proposals, but we should have a separate discussion about the the "recommended" DB backend. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm wondering if using database snapshots to access old state roots is overkill, or even necessary for SMT. By default, you can already access the old state roots in the LazyLedger SMT implementation, because the tree isn't garbage collected currently. Once garbage collection is added, it could be configured to only garbage collect state roots older than a certain version, which would be equivalent to snapshots, no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I saw this before, and I'm pretty sure it's about app-level logic executing app-level transaction (iteration over entire KV-store) to generate app-level snapshots.
I think we discuss snapshots because of the "storage" part of proposal not "state commitment" part. For SMT, I also don't see a reason to use DB-level snapshots.
I agree that it is more than reasonable to use capabilities of DB to ensure we can access&manage previous versions of state. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point.
PS: I assume in your current implementation, you don't update nodes - instead you create a new one, right? This allows you to keep all old versions. Why do you do that, instead of updating nodes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have to add pruning to LL SMT anyway, otherwise the state will keep growing.
Nodes are stored in a key => value store where the key is a hash of the node, and the value is the preimage of the hash (i.e. the hashes of the children of the node). Assuming the hashes are collision resistant, you can't "update" a node since the tree is immutable - you can only "create" a new tree with a new root. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But you have a path to the updated Leaf, so you can use it to remove old nodes:
|
||
Many underlying DB engines support snapshotting. Hence, we propose to reuse that functionality and limit the supported DB engines to ones which support snapshots (Badger, RocksDB, BoltDB) using a _copy on write_ mechanism. | ||
|
||
### Pruning | ||
|
||
At minimum SC doesn't need to keep old versions. However we need to be able to process transactions and roll-back state updates if transaction fails. This can be done in the following way:dDuring transaction processing, we keep all state change requests (writes) in a `CacheWrapper` abstraction (as it's done today). Only when we commit on a root store, all changes are written to the the SMT. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
We can use the same approach for SM Storage. However, we need to keep few past versions (configurable by user, eg: 10 past versions every 100 blocks) in a form of snapshot. Ideally we would like to shift that functionality to a DB engine itself. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
TODO: Verify which DB engines support that. I'm pretty confident this (pruning and versioning)can and should be offloaded to a DB engine. | ||
Otherwise, the solution is to implement a sort of _mark and sweep GC_: once per defined period, a GC will start, mark old objects and prune them. This will require encoding a version mechanism in a KV store. | ||
|
||
|
||
|
||
## Consequences | ||
|
||
|
||
### Backwards Compatibility | ||
|
||
This ADR doesn't introduce any SDK level API changes. | ||
|
||
We change a storage layout, so storage migration and a blockchain reboot is required. | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Positive | ||
|
||
+ Decoupling state from state commitment introduce better engineering opportunities for further optimizations and better storage patterns. | ||
+ Performance improvements. | ||
+ Joining SMT based camp which has wider and proven adoption than IAVL. Example projects which decided on SMT: Ethereum2, Diem (Libra), Trillan, Tezos, LazyLedger. | ||
|
||
### Negative | ||
|
||
+ Storage migration | ||
+ LL SMT doesn't support pruning - we will need to add and test that functionality. | ||
|
||
### Neutral | ||
|
||
+ Deprecating IAVL, which is one of the core proposals of Cosmos Whitepaper. | ||
tac0turtle marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
## Further Discussions | ||
|
||
### RDBMS | ||
|
||
Use of RDBMS instead of simple KV store for state. Use of RDBMS will require an SDK API breaking change (`KVStore` interface), will allow better data extraction and indexing solutions. Instead of saving an object as a single blob of bytes, we could save it as record in a table in the state storage layer, and as a `hash(key, protobuf(object))` in the SMT as outlined above. To verify that an object registered in RDBMS is same as the one committed to SMT, one will need to load it from RDBMS, marshal using protobuf, hash and do SMT search. | ||
|
||
|
||
## References | ||
robert-zaremba marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
+ [IAVL What's Next?](https://github.com/cosmos/cosmos-sdk/issues/7100) | ||
+ [IAVL overview](https://docs.google.com/document/d/16Z_hW2rSAmoyMENO-RlAhQjAG3mSNKsQueMnKpmcBv0/edit#heading=h.yd2th7x3o1iv) of it's state v0.15 | ||
+ [State commitments and storage report](https://paper.dropbox.com/published/State-commitments-and-storage-review--BDvA1MLwRtOx55KRihJ5xxLbBw-KeEB7eOd11pNrZvVtqUgL3h) | ||
+ [LazyLedger SMT](https://github.com/lazyledger/smt) | ||
+ Facebook Diem (Libra) SMT [design](https://developers.diem.com/papers/jellyfish-merkle-tree/2021-01-14.pdf) | ||
+ [Trillian Revocation Transparency](https://github.com/google/trillian/blob/master/docs/papers/RevocationTransparency.pdf), [Trillian Verifiable Data Structures](https://github.com/google/trillian/blob/master/docs/papers/VerifiableDataStructures.pdf). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this shouldn't in fact be two ADRs instead? One for separating storage and commitments and one about the SMT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking about it. But they are highly related - one cannot be done without other. Hence, I'm proposing here a general design and leave a space for future ADR for RDMS which will introduce SDK breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we could separate the two with IAVL right? We don't need SMT for that AFAIK...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronc, we could describe here only SMT, but it will only a half backed idea without a working solution:
Do you have something else in mind?