-
Notifications
You must be signed in to change notification settings - Fork 16
Conversation
5860324
to
be692c3
Compare
1aa1627
to
e2fba2d
Compare
@thiolliere can you please review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like explaining what each methods do in the trait is a bit redundant with the trait documentation.
I'm not sure of the scope of this doc but I feel like part of it is redundant with decl_storage and traits doc.
Also in my opinion storage doc should be more about explaining the trie and then just saying:
- value is an abstraction which stores one value at a certain path in the trie
- map stores value at some_path++hasher(key)
- double map stores avalue at some_path++ hasher1(key)++hasher2(key)
I think then it is more understandable what are the cost of methods.
But I'm not very aware about how is organized developer-hub so this is just humble suggestions.
by the way I think Shawn deep dive was very good explanation of trie https://www.shawntabrizi.com/substrate/substrate-storage-deep-dive/
current/runtime/storage.md
Outdated
|
||
### Storage Maps | ||
|
||
Storage Maps are implemented as hash maps, which is a pattern that should be familiar to most developers. In order to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storage Maps are implemented as hash maps
maybe it is only me but for me hash map is a data structure which handle collision. at least if presented this way it should be clear that collision are not handled.
But actually instead of using "hash map" term I would rather say that it is a map where key are hashed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this clarification. Please let me know if you think my solution was sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I see, I'm not sure actually what I proposed was better, like
Storage Maps are implemented as maps with hashed keys, which is a pattern that should be familiar to most developers.
wouldn't have make sense to me I think if I didn't know it already, not sure how to do better. (still for me the clearest is that a storage map insert at the path hash(encode(key))
the value encoded.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to clarify this in the docs 👍
current/runtime/storage.md
Outdated
Storage Maps are implemented as hash maps, which is a pattern that should be familiar to most developers. In order to | ||
give blockchain engineers increased control over the way in which these data structures are stored, Substrate allows | ||
developers to select the hashing algorithm that is used to generate map keys. Map data structures are ideal for managing | ||
sets of items whose elements will be accessed randomly, as opposed to iterating over them sequentially in their entirety. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to show how it is implemented in the underlying trie to make it more understandable about advantage/disadvantage.
Basically the value is put in the trie at the path==some_storage_prefix++hashed_key.
So each value is stored in an individual node, so reading/writing a value only decode/encode one raw_value which is stored in one trie-node.
also the performance of the map will depends on how big the map is because read/writing one value will recompute the path==some_prefix++hashed_key. and the hashed_key part can be more or less costy to compute. https://www.shawntabrizi.com/substrate/transparent-keys-in-substrate/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a separate document that goes over how storage is implemented. This document is meant to explain Substrate's storage interfaces and help blockchain runtime developers understand when and how to use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this document will not explain how frame_support::storage abstraction are implemented in storage do they ? so I would explain their implementation here actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the purpose of the advance document is to cover implementation and the purpose of this document (the runtime document) is to cover the interface. @joepetrowski can you provide your input?
current/runtime/storage.md
Outdated
|
||
[Storage Maps expose an API](https://substrate.dev/rustdocs/master/frame_support/storage/trait.StorageMap.html#required-methods) | ||
that is similar to that of Storage Values. The selected methods referenced below all use a single key; use two keys for | ||
[Storage Double Maps](https://substrate.dev/rustdocs/master/frame_support/storage/trait.StorageDoubleMap.html#required-methods): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also I think we should say that using this two keys result in this in the trie: key1, key2, value is stored at
value is stored at some_prefix++hashe1(key1)++hasher2(key2).
and so you can iterate on one key1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is information I also think would belong in the advanced document.
current/runtime/storage.md
Outdated
The Substrate storage API provides iterable map implementations. Because maps are often used to track unbounded sets of | ||
data (account balances, for example) it is especially likely to exceed block production time by iterating over maps in | ||
their entirety within the runtime. Furthermore, because maps are comprised of more layers of indirection than native | ||
lists, they are significantly more costly than lists to iterate over with respect to time. Depending on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just explain what is the "more layers of indirection than native lists", basically iterating over a map is just iterating over all keys after the prefix storage_prefix
so this trie iteration is more costy than native list because it is a trie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt like that was a step too far for this document and that topics like that belong in the advanced storage documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above document just explain how the trie works not how frame_support storage abstractions are implemented using the trie, no ?
As a user in order to understand more precisely the cost of those abstraction, explaining their implementation on the trie seems useful (maybe it belong somewhere else in the doc but still I feel like saying that a storage map just store all its value after a prefix and iterating upon means iterating the trie after a prefix is quite straightforward and quickly understandable)
Because maps are often used to track unbounded sets of
data it is especially likely to exceed block production time by iterating over maps in
their entirety within the runtime
This is just a trade off, some storage map in the runtime are quite bounded actually. But we use storage map because they are more often accessed by individual value than as a whole. Thus using a storage map is more efficient because accessing one value is cheaper, but it is sometime iterated upon still because it is not too big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to clarify this in the docs 👍
current/runtime/storage.md
Outdated
#### Methods | ||
|
||
[Iterable Storage Maps expose the following methods](https://substrate.dev/rustdocs/master/frame_support/storage/trait.IterableStorageMap.html#required-methods) | ||
in addition to the other map methods: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
their method are slightly different for map and doublemap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to clarify this in the docs 👍
current/runtime/storage.md
Outdated
##### `translate(fn)` | ||
|
||
Use the provided function to translate all elements of the map, in no particular order. To remove an element from the | ||
map, return `None` from the translation function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they also differ between map and double_map (as a note I tried to unify interface here paritytech/substrate#5335)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to clarify this in the docs 👍
0d455a6
to
cfe58eb
Compare
Thank you @thiolliere! I have tried to address your comments while balancing the goal of this document with respect to the goal of the advanced storage document. |
9808d9f
to
5301a8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think a practical section on best practices would be useful.
Is the goal of this PR to actually have all this content on a single page? Or are you just collecting information here? Jumping between conceptual items around storage, to macro syntax about declaration, to RPC call stuff seems like a lot for one page, and what I would consider to be quite different topics when it comes to storage. I might tune your focus into two sections with somewhat different goals:
Someone who does not really want to make the best decisions but just wants code to work should read the second doc. Someone who just wants to learn conceptually whats going on should read the first. A "real dev" would read both and be able to make the right decisions using the combined information. If you break down Substrate Storage Conceptually, then a topic like the RPC queries become a lot more natural "We are just querying a database key, no other abstractions of storage leak through the RPC". It also informs the user the advantages of a single value storage item versus a storage map, something that is probably the most important topic to take away for the average dev. They should be thinking at this level. Then when it comes to the ergonomics of the language, some users just want to know what to type to make the thing compile, and that would go into all the syntax level details I see here. As it is written, I think it is hard to ingest and get a clear message imo. |
df43692
to
f818588
Compare
@thiolliere and @shawntabrizi - I have made some structural changes that I hope will address your comments. @joepetrowski - per our conversation yesterday, I think this is now ready for a final review. |
c03bf3c
to
1fb7660
Compare
b940ea5
to
a54712a
Compare
@joepetrowski - any suggestions for enhancing the section on genesis configuration? |
current/runtime/storage.md
Outdated
### Genesis Config | ||
|
||
TODO | ||
You can define | ||
[an optional `GenesisConfig`](https://substrate.dev/rustdocs/master/frame_support/macro.decl_storage.html#genesisconfig) | ||
struct in order to initialize Storage Items in the genesis block of your blockchain. | ||
|
||
## Storage Cache | ||
// TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thiolliere what info do you think should go here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GenesisConfig is generated by decl_storage macro, it does initialize storage items for the genesis block of the blockchain.
Its logic is defined by:
config
andbuild
attribute when declaring an individual storage.- add_extra_genesis informations. which allows to add fields in the GenesisConfig struct and a build function to build storages using those fields.
(I can be more precise actually we execute first for each storage the build closure if it exist, or if no build attribute but a config attribute then we add into storage the value found in config. Then we execute the add_extra_genesis::build function.)
current/runtime/storage.md
Outdated
Depending on [the hashing algorithm](#Transparent-Hashing-Algorithms) that you select to generate a | ||
map's keys, you may be able to iterate across its keys and values. Because maps are often used to | ||
track unbounded sets of data (account balances, for example) it is especially likely to exceed block | ||
production time by iterating over maps in their entirety within the runtime. Furthermore, because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, because
accessing the elements of a map requires more pointer dereferencing than accessing the elements of a
native list, maps are significantly more costly than lists to iterate over with respect to time.
Can you explain this a bit more? I am not sure where it comes from and until now my understanding of the state was that the most important factors encoding + db access. In-memory dereferencing seems negligible to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I agree that iterating maps is expensive, I am not sure if your reason is correct.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this line used to read as follows:
Furthermore, because maps are comprised of more layers of indirection than native lists...
I agree with you that the change didn't necessarily make things more clear. I guess what I meant above was that you don't need to dereference memory pointers, per se, but you do have to dereference the pointer to the element in the map...I think I'm trying to describe the "db access" part that you mention above. I will think about alternate ways to phrase this but would definitely appreciate any suggestions 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this now reads:
...accessing the elements of a map requires more database reads than accessing the elements of a native list...
I hope you agree the simplest approach was the best one in this case 😅
current/runtime/storage.md
Outdated
wrong. Being efficient within the runtime of a blockchain is an important first principle of | ||
Substrate and this information is designed to help you understand _all_ of Substrate's storage | ||
capabilities and use them in a way that respects the important first principles around which they | ||
were designed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to mention here when and which maps are iterable at the end of the day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great point and I'm starting to wonder if we should even document non-transparent hashers since they are now deprecated. Should we just behave as if all hashers are transparent and all maps are iterable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asked @shawntabrizi about this offline and he agrees that we should take the simpler approach: only document that which is not deprecated 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to clarify this in the docs 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only had a skim and the text reads ver well to me. As Shawn mentioned I think this is a lot of info and should be organised well so the user can follow it easily. Great work 👍
Sorry I didn't realized earlier but it seems non-transparent hasher are deprecated so:
Thus maybe better to write that storage map are iterable and double map are also iterable, and maybe add a note that this is not the case if you use deprecated hasher |
2718bd1
to
5411956
Compare
@shawntabrizi, @thiolliere, @kianenigma - I have no more changes to make at this time based on your comments. Let me know if I addressed your concerns properly or if there are additional concerns you have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me
| [TwoX 64 Concat](https://crates.parity.io/frame_support/struct.Twox64Concat.html) | | X | | ||
| [Identity](https://crates.parity.io/frame_support/struct.Identity.html) | | | | ||
| [Blake2 128](https://crates.parity.io/frame_support/struct.Blake2_128.html) **DEPRECATED** | X | | | ||
| [TwoX 128](https://crates.parity.io/frame_support/struct.Twox128.html) **DEPRECATED** | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is actually 4 depracated hasher but I don't mind if we don't mention them:
twox_128 - TwoX with 128bit.
twox_256 - TwoX with with 256bit.
blake2_128 - Blake2 with 128bit.
blake2_256 - Blake2 with 256bit.
[the `sc_chain_spec::ChainSpec` trait](https://crates.parity.io/sc_chain_spec/trait.ChainSpec.html). | ||
For a complete and concrete example of using Substrate's genesis storage configuration capabilities, | ||
refer to the `decl_storage` macro in | ||
[the Society pallet](https://github.com/paritytech/substrate/blob/master/frame/society/src/lib.rs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be better to point to example pallet ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that the example pallet does any genesis configuration.
current/runtime/storage.md
Outdated
|
||
// all storage writes go here; no throwing code below this line | ||
|
||
// all event emissions go here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
events can be interleaved with storage writes
current/runtime/storage.md
Outdated
initializing storage, all of which have entry points in the `decl_storage` macro. These mechanisms | ||
all result in the creation of a `GenesisConfig` data type that implements | ||
[the `sp_runtime::BuildModuleGenesisStorage` trait](https://crates.parity.io/sp_runtime/trait.BuildModuleGenesisStorage.html) | ||
and will be added to the storage item's module (e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I read this correctly, what does the runtime GenesisConfig being added to the storage item's module means ?
Maybe we could make it more clear that there is a GenesisConfig overarching type for the runtime which is made by construct_runtime by joining the specified pallet genesis config (specified with Config
)
Co-authored-by: thiolliere <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Closes #5