Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vision: Ergonomic multi-block operations #306

Open
kianenigma opened this issue Aug 19, 2021 · 19 comments
Open

Vision: Ergonomic multi-block operations #306

kianenigma opened this issue Aug 19, 2021 · 19 comments
Labels
I5-enhancement An additional feature request. T1-FRAME This PR/Issue is related to core FRAME, the framework.

Comments

@kianenigma
Copy link
Contributor

kianenigma commented Aug 19, 2021

I have started writing multi-block code in a few occasions already. There are multiple ways to abstract it, but this is the ultimate one:

fn multi_block_function(n: T::BlockNumber) -> Weight {
    // this only executes once. 
    let pre_process = foo(); 
		
    // will generate some storage items that stores the current block number,
    // with some match statements that re-executes this code for the coming 10 blocks. 
    #[pallet::multi_block]
    for i in 0..10 {
        // this is executed 10 times, over the course of 10 blocks
        // variable `n` should be shadowed to the correct value here.
        let _ = bar(n)?;
    } 
	
    // this is exeucted once, right after the last iteration.
    let post_process = baz();
}

This is merely an idea at this stage and there are a lot of caveats, which should be discussed in depth before an attempt at implementation. One I have more concrete ideas about this I will move it to the frame vision project.

@kianenigma
Copy link
Contributor Author

@Ank4n

@xlc
Copy link
Contributor

xlc commented Sep 22, 2022

What we really want is async operations. We already have proper abstraction for it with years of experiences and I want to make sure we don't repeat the same mistakes and introduce more race condition bugs

@ggwpez
Copy link
Member

ggwpez commented Sep 23, 2022

What we really want is async operations.

What kind of operations? For example iterators on StorageMaps? So that StorageMap::iter_keys().map( would be multi-block operation.
Or rather wrapping a whole migration code block in some kind of async macro and using storage operations as yield points?

multi_block! {
    // migration code that will be split into multiple blocks.
    StorageMap::translate(t);
}

I think this is more complicated than having some kind of block number counter.
What are the race conditions do you see with this approach?

@xlc
Copy link
Contributor

xlc commented Sep 24, 2022

The main reason we want to split some operations to multiple blocks is due weight limit, not just because we want to do it due to business reason (which I think is out of scope for this issue).

Before a language natively supports async operations, we use callbacks to implement it. So it will be something like this

fn do_work(i: u32) {
  // do something with i
}

fn get_work() -> Option<u32> {
 // return next work
}

fn get_and_do_work() -> Weight {
  if let Some(work) = get_work() {
    do_work(work)
     multi_block_helper::execute_or_queue(Call::get_and_do_work)
    return weight_info::get_and_do_work(work)
  }
  return weight_info::get_work()
}

fn migration() {
   multi_block_helper::execute_or_queue(Call::get_and_do_work)
}

Later, if we somehow get async support and some weight integration, we can make it

#[weight = weight_info::do_work(i)]
fn do_work(i: u32) {
  // do something with I
}

#[weight = weight_info::get_work()]
fn get_work() -> Option<u32> {
 // return next work
}

async fn migration() {
  while let Some(work) = get_work() {
     do_work(work).await // check if remaining weight is enough, then execute, or defer it to next block
  }
   // code here will be executed after all the work is completed
}

@ggwpez
Copy link
Member

ggwpez commented Oct 12, 2022

Okay so there are a few other cases where multi-block or multi-call aka. async operations would be useful:
paritytech/substrate#12367 (comment) (uses remove_prefix)
paritytech/substrate#12310 (deletes a StorageMap which could produce a too high weight for one call/block)

So it is not limited to migrations where this problem arises…

@ruseinov
Copy link
Contributor

It seems like this issue belongs to the FRAME board.

I'd be very happy to take this on once we reach consensus on how to do it.

While I was working on this here it dawned on me how painful it is to stub all the calls touching the storage out. It would be really great to be able to lock particular storage items for writing and/or reading while the migration is running, at least as an intermediate solution.

In case of the example above - it would be great to be able to lock Nominators map for writing, while still allowing the reads. And simply return an error, something like MigrationRunning or whatever else.

It would be great if we could store the locked: LockEnum {Read, Write, ReadWrite, NotLocked} state of the storage and simply restrict access as needed.

@ggwpez
Copy link
Member

ggwpez commented Feb 13, 2023

I dont think locking storage regions can work. The API would be really ugly without infallible storage operations.

Maybe we can write the changes of a migration into a new storage transaction (aka changeset) and commit that once its done. That would split the reading part (PoV intense) into multiple blocks and put the writing part into the last block. Ideally it just swaps two sub-tree hashes and does not copy anything.
Then we also need a resolve_conflict functionality to let the developer handle merge conflicts for changes that happend while the transaction was open. Merge conflicts should be trivial to resolve in the cases that something was inserted, removed or modified unless the modification makes it un-migratable.

@bkchr
Copy link
Member

bkchr commented Feb 13, 2023

Maybe we can write the changes of a migration into a new storage transaction (aka changeset) and commit that once its done. That would split the reading part (PoV intense) into multiple blocks and put the writing part into the last block. Ideally it just swaps two sub-tree hashes and does not copy anything.

This would not work. How should that work for Parachains? The validation is always stateless. The changeset would also not be tracked as part of the state.

@xlc
Copy link
Contributor

xlc commented Feb 13, 2023

I don't think it is possible to create a unified solution that solves multi-block operations for all use cases.

Take an example for migration, there is no reason we have to write the code in such way that we have to complete migration in N blocks. Just make it a lazy migration. We have already implemented this lazy migration pattern in Substrate.

For some other use cases, Acala have implemented an idle scheduler and we currently are using this to free up storages from removed EVM contracts. Similarly, this can be used for other time insensitive multi block operations.

@kianenigma
Copy link
Contributor Author

kianenigma commented Feb 14, 2023

The original idea of this issue is being overlooked a bit, which is tailored towards providing a easy to use syntax to write multi-block operations, much like OpenMP's directives or Rust's Rayon macros. This certainly won't be generalizable. If you want a generalizable, writing it from scratch using on-idle or on-initialize (as done by Acala's scheduler) is not super hard.

Note: the origin of this issue is in the NPoS project because I hoped to implement #465 using this.

@gavofyork
Copy link
Member

FWIW Async operations would also be a gamechanger for XCM/XCMP. Right now the best we can do is have a dispatchable be called when an XCM reply arrives. Far far better is if we could so a multi-block await on the reply.

@gavofyork
Copy link
Member

I think it's realistic to be able to mostly suspend the chain while a multi-block migration completes. Not for every pallet (session/system/staking/babe/grandpa come to mind as non-suspendable), but almost all of the others should be fine to pause without risking bricking the chain.

@bkchr
Copy link
Member

bkchr commented Feb 14, 2023

I think it's realistic to be able to mostly suspend the chain while a multi-block migration completes. Not for every pallet (session/system/staking/babe/grandpa come to mind as non-suspendable), but almost all of the others should be fine to pause without risking bricking the chain.

Yes. I also have come to a similar conclusion. Proper multi block migration should involve migrating consensus related storage items in the first block that uses the new runtime and then all the other migrations can happen while the chain is paused.

@ruseinov
Copy link
Contributor

Yes. I also have come to a similar conclusion. Proper multi block migration should involve migrating consensus related storage items in the first block that uses the new runtime and then all the other migrations can happen while the chain is paused.

Sounds good. In the example that I mentioned before the situation with people being able to write to that particular storage absolutely does not work, and that's considering everything is happening on the relay chain only.
If the chain was paused (no storage i/o) - that migration would be a piece of cake to write and execute. Right now it requires lots of modifications that just block certain extrinsics depending on a storage flag. It's not ideal and very error-prone.

@bkchr
Copy link
Member

bkchr commented Feb 14, 2023

Right now it requires lots of modifications that just block certain extrinsics depending on a storage flag. It's not ideal and very error-prone.

We would block all extrinsics beside the inherents. Migrations will not run for that long time, so it is fine to not include extrinsics for a short period of time.

@ruseinov
Copy link
Contributor

Would be great to see this going forward. Happy to lend a hand where I can, sounds like an interesting problem to tackle with a lot to gain.

I'm assuming there isn't a good way to "pause" pallets now. As far as I understand that would require some sort of flag that would switch things off under the hood until a certain cached storage item gets set/unset.
Since we can't really pause the storage due to the operations on it being infallible, I guess that means some sort of a global flag that could be disabled per pallet for the pallets mentioned above that can't be fully paused.

@ggwpez
Copy link
Member

ggwpez commented Feb 16, 2023

Since we can't really pause the storage due to the operations on it being infallible, I guess that means some sort of a global flag that could be disabled per pallet for the pallets mentioned above that can't be fully paused.

paritytech/substrate#12092 could be used I think. Or maybe we need some more fundamental pausing of all extrinsics, not sure.

@bkchr
Copy link
Member

bkchr commented Feb 16, 2023

paritytech/substrate#12092 could be used I think. Or maybe we need some more fundamental pausing of all extrinsics, not sure.

No this should be exactly what we need/want.

@ruseinov
Copy link
Contributor

Since we can't really pause the storage due to the operations on it being infallible, I guess that means some sort of a global flag that could be disabled per pallet for the pallets mentioned above that can't be fully paused.

paritytech/substrate#12092 could be used I think. Or maybe we need some more fundamental pausing of all extrinsics, not sure.

Looks sufficient to me too.

@kianenigma kianenigma changed the title Ergonomic multi-block operations Vision: Ergonomic multi-block operations Mar 10, 2023
@juangirini juangirini transferred this issue from paritytech/substrate Aug 24, 2023
@the-right-joyce the-right-joyce added I5-enhancement An additional feature request. T1-FRAME This PR/Issue is related to core FRAME, the framework. and removed J0-enhancement labels Aug 25, 2023
jonathanudd pushed a commit to jonathanudd/polkadot-sdk that referenced this issue Apr 10, 2024
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.37 to 0.1.38.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](dtolnay/async-trait@0.1.37...0.1.38)

Signed-off-by: dependabot-preview[bot] <[email protected]>

Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com>
Co-authored-by: Svyatoslav Nikolsky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I5-enhancement An additional feature request. T1-FRAME This PR/Issue is related to core FRAME, the framework.
Projects
Status: Backlog
Development

No branches or pull requests

8 participants