Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Basic Proving Trie for the Runtime #3881

Merged

Conversation

shawntabrizi
Copy link
Member

@shawntabrizi shawntabrizi commented Mar 28, 2024

This PR will introduce a BasicProvingTrie type, which makes it easy to construct and prove data in a base-16 merkle trie within the runtime.

Data into the merkle trie only require that they implement Encode / Decode.

A FRAME compatible TrieError was created and added to DispatchError.

Expected usage is to construct the merkle trie with all data offline, and then place only the merkle root of that trie on-chain.

Also offchain, a user is given a compact merkle proof of some data they want to prove exists on the blockchain.

Then in the runtime, you can call verify_single_value_proof or verify_proof with the root, proof, and the keys and values you want to verify exists in the merkle trie.

Closes #3880

Contributes to #5400

@shawntabrizi shawntabrizi marked this pull request as ready for review March 28, 2024 22:05
@shawntabrizi
Copy link
Member Author

For a full end to end usage in the runtime, I would like to create a test which uses this object to verify some data in a child trie.

However, I don't think the child trie currently supports creating a proof, and not sure how to get that working.

@burdges
Copy link

burdges commented Mar 29, 2024

I'd kinda expect merkle proofs should always be batched, so you prove for many keys simultaniously. I've no worked out how much more efficent this makes them of course, but we so some dumb things like compressing, when the proofs should've never contained duplicates in the first palce.

In principle, one could make this work in the runtme by using unsafe code which writes the queries into a buffer, and then some function in on_finalize checks them all.

Copy link
Member

@ggwpez ggwpez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For airdrops? Yea i guess its useful. Maybe AAS (airdrop as a service) would also help.

substrate/primitives/runtime/src/proving_trie.rs Outdated Show resolved Hide resolved
substrate/primitives/runtime/src/proving_trie.rs Outdated Show resolved Hide resolved
let mut root = Default::default();

{
let mut trie = TrieDBMutBuilderV0::new(&mut db, &mut root).build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheme why do we use a specific version here? Should this not be V1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the version does not matter much for proof as V0 can be use on a V1 trie and V1 can be use on a V0, and since it is a proof we mainly care for access node (both version should have same access patterns).
In this case though, because we return the root, we need to choose the right version (proof content will be the same but root will differ).
In the case of historical, I think the value where all 32 bytes so V0 and V1 would work the same, but I kept V0 to be conservative.
For a generic toolling, the version should be pass in parameter (even if in most case there is no reason to use V0).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want me to make this a parameter we pass in? or make it V1, or leave it untouched?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either V1 only (would not be compatible in some case) or pass it as a parameter.

@ggwpez ggwpez requested a review from cheme March 29, 2024 12:45
@cheme
Copy link
Contributor

cheme commented Mar 29, 2024

I'd kinda expect merkle proofs should always be batched, so you prove for many keys simultaniously. I've no worked out how much more efficent this makes them of course, but we so some dumb things like compressing, when the proofs should've never contained duplicates in the first palce.

In principle, one could make this work in the runtme by using unsafe code which writes the queries into a buffer, and then some function in on_finalize checks them all.

in this case all insert are done at the same time and trie keep all change in memory before calculating the root.
There is a possible micro optimization that would be to sort with a btree then calculate new root without storing the content in triedbmut (but it will be store anyway so not really better memory wise (maybe currently we clone on instert though)). I remember writing this code (basically what trie-root does but with possibly some existin content), it is faster but by a low margin (pbbly ~20%) and the code being a bit involved I did not push it.

If considering batching between different extrinsic it is not doable currently (as the mem is not shared between two extrinsic (vm design I think)), I hope it will be doable with polkavm.

@cheme
Copy link
Contributor

cheme commented Mar 29, 2024

Thinking of it, the best would be to use this inline root update calculation over ordered key value payload and store the stack of pending change (max size the max depth of the trie time max branch size and a few temp variables). This way it could be processed in multiple blocks or extrinsic.
I guess it can also be done in none order changeset but then the size of the pending change to store will be unbounded.

Edit : actually the first case is also unbounded in size since we need to also store store the proof content (between blocks/extrinsics).

Copy link
Contributor

@cheme cheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code existing is fine, and I think extracting it can be a good move, I am just thinking it is pretty limited in its usages (small trie size, single query at this time, suboptimal merkle trie) so I would prefer if it was not part of sp-runtime but in its own crate.
Would make sense to also replace the existing code to use this one.

@@ -85,6 +85,7 @@ pub mod generic;
pub mod legacy;
mod multiaddress;
pub mod offchain;
pub mod proving_trie;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure if it should be part of sp-runtime crate.
Thing is the choice of using this trie should not be a default for any runtime (I think it is not the best tree for most use case, and I would encourage more binary trie, anyway it is very use case specific).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a binary trie available in Substrate?

Does this trie work along side child tries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the proof is to be run on child trie, there is no choice but to use this trie indeed.
(if using child trie root to prove, please be careful the chain will not be switching from trie V0 to trie V1 (should not matter for crowdloan as iirc the values are <= 32 byte), as a migration may change child trie root.
About this trie version, the simpliest would probably be to just pass the version in parameter.
When I read the PR I did not have this use case in mind but it is true that if what we are proving is from substrate state, then it could make sense to have this in runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I name this ProvingTrieBase16, and that would be better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too sure anymore what was my point 🤦
Probably at first reading I was thinking it should be an external crate, then realizing it is being tightly linked with substrate state could actually be fine.
ProvingTrieBase16 may not be a good idea as long as substrate does have a single trie backend.
Maybe it could simple be a module in sp-trie, but guess I am ok with things being this way.

substrate/primitives/runtime/src/proving_trie.rs Outdated Show resolved Hide resolved
.and_then(|raw| Value::decode(&mut &*raw).ok())
}

fn create_proof(&self, key: Key) -> Option<Vec<Vec<u8>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better passing Vec and allow call multiple query here.
For now we can simply rename to create_single_value_access_proof I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated name and also added the multi-value API

substrate/primitives/runtime/src/proving_trie.rs Outdated Show resolved Hide resolved
substrate/primitives/runtime/src/proving_trie.rs Outdated Show resolved Hide resolved
@kianenigma
Copy link
Contributor

@Ank4n @gpestana this primitive can possibly be used to migrate large chunks of data/snapshot to parachains as well, PTAL :)

@Ank4n
Copy link
Contributor

Ank4n commented Apr 4, 2024

@Ank4n @gpestana this primitive can possibly be used to migrate large chunks of data/snapshot to parachains as well, PTAL :)

Yeah we might end up doing something very similar to this to move staked balances to the staking parachain.

@kianenigma
Copy link
Contributor

end up doing something very similar to this

I hope the outcome is similar and reusable enough that we push this PR to completion and use that. Do you think this is the case?

@Polkadot-Forum
Copy link

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/disposable-parachains-for-airdrops-and-other-ideas/5769/20

@shawntabrizi
Copy link
Member Author

@cheme @ggwpez sorry for super delay in updating this PR.

Can you take a second look and let me know what else is needed to be mergeable?


/// Create the full verification data needed to prove all `keys` and their values in the trie.
/// Returns `None` if the nodes within the current `MemoryDB` are insufficient to create a
/// proof.
Copy link
Contributor

@cheme cheme Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we do not pass the trie version in parameter and use v1 as default, it seems important to me to state it in the doc. Also true for create_single_proof
Proof are created with latest substrate trie format, runtime using non latest should not use it (eg trie_version = 0).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

prdoc/pr_3881.prdoc Outdated Show resolved Hide resolved
Co-authored-by: Oliver Tale-Yazdi <[email protected]>
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 2/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7245864

@shawntabrizi shawntabrizi changed the title Create a Generic Proving Trie Create a Basic Proving Trie for the Runtime Sep 3, 2024
@shawntabrizi shawntabrizi added this pull request to the merge queue Sep 4, 2024
Merged via the queue into paritytech:master with commit 1cff666 Sep 4, 2024
184 of 187 checks passed
@shawntabrizi shawntabrizi deleted the shawntabrizi-proving-trie branch September 4, 2024 15:59
@ggwpez ggwpez added T17-primitives Changes to primitives that are not covered by any other label. and removed T4-runtime_API This PR/Issue is related to runtime APIs. labels Sep 6, 2024
github-merge-queue bot pushed a commit that referenced this pull request Sep 30, 2024
This is a refactor and improvement from:
#3881

- `sp_runtime::proving_trie` now exposes a `BasicProvingTrie` for both
`base2` and `base16`.
- APIs for `base16` are more focused on single value proofs, also
aligning their APIs with the `base2` trie
- A `ProvingTrie` trait is included which wraps both the `base2` and
`base16` trie, and exposes all APIs needed for an end to end scenario.
- A `ProofToHashes` trait is exposed which can allow us to write proper
benchmarks for the merkle trie.

---------

Co-authored-by: Ankan <[email protected]>
Co-authored-by: Adrian Catangiu <[email protected]>
paritytech-ci pushed a commit that referenced this pull request Oct 1, 2024
This is a refactor and improvement from:
#3881

- `sp_runtime::proving_trie` now exposes a `BasicProvingTrie` for both
`base2` and `base16`.
- APIs for `base16` are more focused on single value proofs, also
aligning their APIs with the `base2` trie
- A `ProvingTrie` trait is included which wraps both the `base2` and
`base16` trie, and exposes all APIs needed for an end to end scenario.
- A `ProofToHashes` trait is exposed which can allow us to write proper
benchmarks for the merkle trie.

---------

Co-authored-by: Ankan <[email protected]>
Co-authored-by: Adrian Catangiu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T17-primitives Changes to primitives that are not covered by any other label.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make ProvingTrie Generic and Reusable
9 participants