Enforce migrations to be standalone scripts #296

kianenigma · 2021-11-18T16:38:26Z

Follow up on the discussion here: paritytech/substrate#10233 (comment). Currently, we keep some migration codes in the code, and even those that we do are pretty much useless. Given this our support for historical migrations are basically near-nothing.

I propose we enforce all migrations to be self-containing henceforth. This means that they should not depend on anything from the pallet.

If they need a struct, they must duplicate that version of it in their own local scope. If they need a particular storage item, they can mimic it using generate_storage_alias.

Next to this, we are moving toward banning on_runtime_upgrade inside the pallet (paritytech/substrate#8713). This is aligned with this issue as well. To begin with, migrations should be controlled explicitly by the top-level runtime.

Once we have all of this, I belive we can make migrations fully automated. Each migration will have a u32 id. Each pallet will have an on-chain u32 identifier, akin to the current storage version. Upon each runtime upgrade, if any migrations have been added to the repository will be executed. These migrations need to be added to some static registry.

Regardless of realizing the last statement or not, in order to for the migrations codes that we keep in substrate to have any utility, we better start by making them independent of the pallet. Some examples of this already exist (pallet-elections-phragmen).

The text was updated successfully, but these errors were encountered:

KiChjang · 2022-01-19T07:15:49Z

Would this make migrations look more or less akin to database migrations for a server backend? I think that should ideally be what we're aiming for here.

kianenigma · 2022-01-19T19:31:52Z

Would this make migrations look more or less akin to database migrations for a server backend? I think that should ideally be what we're aiming for here.

totally.

ruseinov · 2023-01-10T13:59:40Z

I really like the idea of migrations not depending on the pallet code, however I'd like to chip in with my 2 cents on the subject, before we go down this particular rabbit hole.

First of all, this will increase the bloat, because we'll need to redefine the structs used for the storage. In case of one or two structs that's kind of okay, when this becomes more complex with some nesting levels - different game.

Secondly, it's going to become harder for newcomers to write migrations efficiently, especially when we start talking about more complex scenarios, which I'm going to provide as examples below.

Last, but not least, it's kind of nice to have compile-time guarantees when writing migrations with more or less complicated structs involved, as when we are reusing the storage structs from pallets - we know for sure the struct resembles the storage. It's not that big of a deal as each migration needs to be tested anyway, but it still adds to the overall migration testing/implementation burden.

Some examples of migrations that no longer work due to the fact that the code has changed:

Staking, with some comments within the code here

pub mod v8 {
	use super::*;
	use crate::{Config, Nominators, Pallet, Weight};
	use frame_election_provider_support::SortedListProvider;
	use frame_support::traits::Get;

	#[cfg(feature = "try-runtime")]
	pub fn pre_migrate<T: Config>() -> Result<(), &'static str> {
		frame_support::ensure!(
			StorageVersion::<T>::get() == ObsoleteReleases::V7_0_0,
			"must upgrade linearly"
		);

		crate::log!(info, "👜 staking bags-list migration passes PRE migrate checks ✅",);
		Ok(())
	}

	/// Migration to sorted `VoterList`.
	pub fn migrate<T: Config>() -> Weight {
		if StorageVersion::<T>::get() == ObsoleteReleases::V7_0_0 {
			crate::log!(info, "migrating staking to ObsoleteReleases::V8_0_0");
                        // We will no longer have the Bags-List pallet instance here, so we'll have to reimplement this functionality
                       // which relies in several method and storage structs that are called under the hood.
			let migrated = T::VoterList::unsafe_regenerate(
                                 // replicate the Nominators struct
				Nominators::<T>::iter().map(|(id, _)| id),
				Pallet::<T>::weight_of_fn(),
			);
                         // still need to call that here on in post_migrate
			debug_assert_eq!(T::VoterList::try_state(), Ok(()));

			StorageVersion::<T>::put(ObsoleteReleases::V8_0_0);
			crate::log!(
				info,
				"👜 completed staking migration to ObsoleteReleases::V8_0_0 with {} voters migrated",
				migrated,
			);

			T::BlockWeights::get().max_block
		} else {
			T::DbWeight::get().reads(1)
		}
	}

	#[cfg(feature = "try-runtime")]
	pub fn post_migrate<T: Config>() -> Result<(), &'static str> {
		T::VoterList::try_state().map_err(|_| "VoterList is not in a sane state.")?;
		crate::log!(info, "👜 staking bags-list migration passes POST migrate checks ✅",);
		Ok(())
	}
}

Staking

pub mod v9 {
	use super::*;
	#[cfg(feature = "try-runtime")]
	use frame_support::codec::{Decode, Encode};
	#[cfg(feature = "try-runtime")]
	use sp_std::vec::Vec;

	/// Migration implementation that injects all validators into sorted list.
	///
	/// This is only useful for chains that started their `VoterList` just based on nominators.
	pub struct InjectValidatorsIntoVoterList<T>(sp_std::marker::PhantomData<T>);
	impl<T: Config> OnRuntimeUpgrade for InjectValidatorsIntoVoterList<T> {
		fn on_runtime_upgrade() -> Weight {
			if StorageVersion::<T>::get() == ObsoleteReleases::V8_0_0 {
                                 // need to replicate bags list storage and some of the SortedListProvider methods
				let prev_count = T::VoterList::count();
				let weight_of_cached = Pallet::<T>::weight_of_fn();
                                 // replicating validators struct
				for (v, _) in Validators::<T>::iter() {
					let weight = weight_of_cached(&v);
                                         // a bunch more VoterList action that needs to be taken into account
                                         
					let _ = T::VoterList::on_insert(v.clone(), weight).map_err(|err| {
						log!(warn, "failed to insert {:?} into VoterList: {:?}", v, err)
					});
				}

				log!(
					info,
					"injected a total of {} new voters, prev count: {} next count: {}, updating to version 9",
					Validators::<T>::count(),
					prev_count,
					T::VoterList::count(),
				);

				StorageVersion::<T>::put(ObsoleteReleases::V9_0_0);
				T::BlockWeights::get().max_block
			} else {
				log!(
					warn,
					"InjectValidatorsIntoVoterList being executed on the wrong storage \
				version, expected ObsoleteReleases::V8_0_0"
				);
				T::DbWeight::get().reads(1)
			}
		}

		#[cfg(feature = "try-runtime")]
		fn pre_upgrade() -> Result<Vec<u8>, &'static str> {
			frame_support::ensure!(
				StorageVersion::<T>::get() == ObsoleteReleases::V8_0_0,
				"must upgrade linearly"
			);

			let prev_count = T::VoterList::count();
			Ok(prev_count.encode())
		}

		#[cfg(feature = "try-runtime")]
		fn post_upgrade(prev_count: Vec<u8>) -> Result<(), &'static str> {
                        // try_state call
			let prev_count: u32 = Decode::decode(&mut prev_count.as_slice()).expect(
				"the state parameter should be something that was generated by pre_upgrade",
			);
			let post_count = T::VoterList::count();
			let validators = Validators::<T>::count();
			assert!(post_count == prev_count + validators);

			frame_support::ensure!(
				StorageVersion::<T>::get() == ObsoleteReleases::V9_0_0,
				"must upgrade "
			);
			Ok(())
		}
	}
}

In case of replicating, say, BagsList pallet storage and methods we'd have to replicate several pretty intricate methods along with their corresponding structs:

And this is just the tip of the iceberg.

We could also take a look at a simple migration, like that one, which is already quite bloated and it's going to be double the bloat.

Long story short I'm not sure that the added complexity and time spent implementing the migrations is worth being pallet-agnostic.

Why? Breaking an older migration code is a minor concern, it can be commented out and/or provided as code example for posterity.

Running prior migrations is not a problem either, just using release branches / webassembly code from older versions. It's a bit of a pain, but I don't see this happening often either.

Happy to be proven wrong.

bkchr · 2023-01-10T14:08:05Z

Running prior migrations is not a problem either, just using release branches / webassembly code from older versions. It's a bit of a pain, but I don't see this happening often either.

You are forgetting that Polkadot isn't the only user of Substrate. For staking this currently maybe being true, but most of the other pallets are being used by a lot of Parachains etc. Your complaints are nevertheless correct, but we should work on this. We need to identify common patterns and then provide abstractions/macros/tools to help with that.

ggwpez · 2023-01-10T14:12:34Z

paritytech/substrate#13107 tries to abstract the redundant version tracking and logging.

In case of replicating, say, BagsList pallet storage and methods we'd have to replicate several pretty intricate methods along with their corresponding structs:

Uhf… yea that is annoying. Having mutation functions on stored structs makes this much more complicated. I dont see how we could circumvent that besides not using that pattern in the first place. Which is sub-optimal, since its in itself a good pattern.

ruseinov · 2023-01-10T14:30:39Z

Running prior migrations is not a problem either, just using release branches / webassembly code from older versions. It's a bit of a pain, but I don't see this happening often either.

You are forgetting that Polkadot isn't the only user of Substrate. For staking this currently maybe being true, but most of the other pallets are being used by a lot of Parachains etc. Your complaints are nevertheless correct, but we should work on this. We need to identify common patterns and then provide abstractions/macros/tools to help with that.

For now if anybody wants to migrate from a very old storage - they could use old release branches to gradually run whatever migrations they need, git blame will help them to find the right release. I know it's not ideal, but it's definitely doable.
At the moment I can't see how macros or other things could help us deal with, for example, BagsList case.

ruseinov · 2023-01-10T14:46:48Z

paritytech/substrate#13107 tries to abstract the redundant version tracking and logging.

In case of replicating, say, BagsList pallet storage and methods we'd have to replicate several pretty intricate methods along with their corresponding structs:

Uhf… yea that is annoying. Having mutation functions on stored structs makes this much more complicated. I dont see how we could circumvent that besides not using that pattern in the first place. Which is sub-optimal, since its in itself a good pattern.

If I'm being honest I don't see why the actual library code should suffer for migration convenience. If anything it should be the other way around. Migrations are simply one-off helpers that serve the purpose, once.

It's a bit annoying having to potentially use several different release branches to gradually upgrade, but it can be done and it's not that hard. Maybe we should provide some sort of convenience tooling for that instead?

ruseinov · 2023-01-11T12:05:39Z

Now that I re-read the original issue it's a bit more clear to me what the purpose of this is: migration automation.

Still, I can't see how this could be easily achieved as decoupling migrations from pallets is tricky enough, but we also still need to specify the correct storage version of each pallet should anyone start from the current codebase. And if this is wrong and somehow slips into master - that'll be potentially a huge pain as things would fail while trying to roll those migrations out automatically.

It seems like manual control over migrations similar to what we have now, having to add them to individual runtimes, might be a better option. It's a bit more work, but it gives more fine-grained control of what's happening, since we need to compile a new runtime anyway in order to introduce any new code.

I've had similar experiences with SQL migrations on various projects. If the migration is complex enough - it might be a multi-step one, that needs to be run and monitored manually. Even if it's not that complex - it's almost always a dangerous pattern to let it be run automatically or via CI. In our case the automation would come from simply compiling a new runtime with new migrations in place.
One other concern is: what if we introduce 10 migrations to 10 different pallets, if all of those run back-to-back - that could potentially be a concern.

All of the above concerns might be good to consider in order to see how we could mitigate some if not all of them while still providing a reasonable level of automation.

kianenigma · 2023-01-15T01:38:50Z

I think we should not settle this yet and as Basti said try and find patters that are common and help migration scripts live longer. One of them is this:

https://github.com/paritytech/substrate/blob/0246883c404d498090f33e795feb8075fa8d3b6b/frame/elections-phragmen/src/migrations/v3.rs#L41-L48

Which means:

don't rely on the definition of storage structs from your pallet
don't rely on any type coming in from trait Config of your pallet

Using, your PR which sparked this discussion can be solved fairly easily paritytech/substrate@2c6275e as the only breaking change is the introduction of ReadOnlySortedListProvider.

That being said, this is still not perfect because i.e. if the definition of SortedListProvider changes and one of the functions used in the migrations are i.e. removed, then we would again be trouble. That being said, again, that is also solvable.

TLDR; I don't have a plan here, but I am not at a spot where I can also say screw the old migrations and let's move on. If Substrate wants to be ultimate, user-friendly framework that it wants to be, it must solve this issue, and I am down to put more effort into it.

@ruseinov as your comments in #296, these are more about automation which is the end goal, but not the immediate issue. We first have to make migration scripts reusable, then think about automation, so I will not comment further on that for the time being.

It's a bit annoying having to potentially use several different release branches to gradually upgrade, but it can be done and it's not that hard. Maybe we should provide some sort of convenience tooling for that instead?

That's not a bad idea nonetheless. But again, if you look at the fixes needed in paritytech/substrate@2c6275e, it is actually not that much overhead in most cases to keep the migrations sane.

The idea that I am now eager to implement and show to all of you as PoC is this: I will migrate all of the migrations of the nomination pools pallet to be 100% independent of the runtime in some PR, and we can then see what the outcome looks like. If someone else has the time, please go ahead.

athei · 2023-01-17T18:57:30Z

Let us take a step back and think what would be the most convenient and easiest way to deliver migrations for our users:

I think it is the old way of bundling them with the pallet. The pallet knows its current version and the version in storage. It runs all necessary migrations. The runtime just calls into all pallets for migrations on upgrade. Everything just works.

But there are reasons why we are no longer doing it this way. Let me see if I got all of them:

Migrations consume space in the runtime's code. We don't want them in the pallets and hence in the productive runtime.
Migrations are more opaque to runtime authors when handled by the pallets. This is nice but this might lead to a situation where they compound to something that is heavier than a single block.

I think we kind should find ways to address both issues instead of dumping this hugely complicated topic onto our users.

ruseinov · 2023-02-13T18:07:18Z

The idea that I am now eager to implement and show to all of you as PoC is this: I will migrate all of the migrations of the nomination pools pallet to be 100% independent of the runtime in some PR, and we can then see what the outcome looks like. If someone else has the time, please go ahead.

This is not documented in the issue, but I think @kianenigma is not that big of a fan of this idea anymore, as discussed during FRAME steering meet. Or was it Staking meet? I don't remember. Anyhow, since we don't have a unified way/language (SQL-like) to deal with this and a lot of the migrations really need to mimic the business-logic of pallets whose storage they are touching - it seems like the best approach is to do what we've done before.

Migrations consume space in the runtime's code. We don't want them in the pallets and hence in the productive runtime.

It seems like a good idea is to clean up old migrations. And let the users deal with this by gradually upgrading, if they need to. But yeah, the UX of this is not amazing.

* Fix receipt's log transaction_hash * Block's total_difficulty default to 0 * Update ts-test

* Add pallet template from Substrate Dev Hub * Clean up un-needed stuff from template * Sketch out dispatchable interface * Introduce notion of finality chain * Add dependencies which were removed during a rebase * Sketch out idea for finality header-chain pallet * Sketch out ChainVerifier trait * Add storage parameter to verifier * Write out some things I think I need for finality verification * Add some pseudocode for marking finalized headers * Remove parity_scale_codec duplicate * Move verification logic into pallet I've been struggling with getting the generic types between the storage and verifier traits to play nice with each other. As a way to continue making progress I'm moving everything to the pallet. This way I hope to make progress towards a functional pallet. * Start doing verification around authority set changes * Remove commented BridgeStorage and ChainVerifier traits * Create Substrate bridge primitives crate * Add logic for updating scheduled authority sets * Introduce notion of imported headers * Implement basic header ancestry checker * Add mock runtime for tests * Add testing boilerplate * Add some storage read/write sanity tests * Add some basic header import tests * Add tests for ancestry proofs * Create helper for changing authority sets * Fix authority set test Fixes a problem with how the scheduled change was counted as well as a SCALE encoding issue * Correctly check for scheduled change digests There's no guarantee that the consensus digest item will be the last one in a header, which is how it was previously being checked. Thanks to Andre for pointing me to the Grandpa code that does this. * Mark imported headers as finalized when appropriate When a header that finalizes a chain of headers is succesfully imported we also want to mark its ancestors as finalized. * Add helper for writing test headers * Add test helper for scheduling authority set changes * Bump Substrate pallet and primitives to rc6 * Remove Millau verifier implementation * Add some doc comments * Remove some needless returns * Make Clippy happy * Split block import from finalization * Make tests compile again * Add test for finalizing header after importing children * Create a test stub for importing future justifications * Start adding genesis config * Reject justifications from future We should only be accepting justifications for the header which enacted the current authority set. Any ancestors of that header which require a justification can be imported but they must not be finalized. * Add explanation to some `expect()` calls * Start adding GenesisConfig * Plug genesis config into runtime * Remove tests module * Check for overflow when updating authority sets * Make verifier take ownership of headers during import * Only store best finalized header hash Removed the need to store the whole header, since we store it was part of the ImportedHeaders structure anyways * Add some helpers to ImportedHeader * Update ancestry checker to work with ImportedHeaders * Update ancestry tests to use ImportedHeaders * Update import tests to use ImportedHeaders * Clean up some of the test helpers * Remove stray dbg! * Add doc comments throughout * Remove runtime related code * Fix Clippy warnings * Remove trait bound on ImportedHeader struct * Simplify checks in GenesisConfig * Rename `get_header_by_hash()` * Alias `parity_scale_codec` to `codec` * Reword Verifier documentation * Missed codec rename in tests * Split ImportError into FinalizationError * Remove ChainVerifier trait This trait was a remenant of the original design, and it is not required at the moment. Something like it should be added back in the future to ensure that other chains which conform to this interface can be used by higher-level bridge applications. * Fix the verifier tests so they compile * Implement Deref for ImportedHeader * Get rid of `new` methods for some Substrate primitives * Ensure that a child header's number follows its parent's * Prevent ancestry checker from aimlessly traversing to genesis If an ancestor which was newer than the child header we were checking we would walk all the way to genesis before realizing that we weren't related. This commit fixes that. * Remove redundant clones * Ensure that old headers are not finalized Prevents a panic where if the header being imported and `best_finalized` were the same header the ancestry checker would return an empty list. We had made an assumption that the list would always be populated, and if this didn't hold we would end up panicking. * Disallow imports at same height as `best_finalized` * Fix Clippy warnings * Make NextScheduledChange optional * Rework how scheduled authority set changes are enacted We now require a justification for headers which _enact_ changes instead of those which _schedule_ changes. A few changes had to be made to accomodate this, such as changing when we check for scheduled change logs in incoming headers. * Update documentation for Substrate Primitives * Clarify why we skip header in requires_justification check * Add description to assert! call * Fix formatting within macros * Remove unused dependencies from runtime * Remove expect call in GenesisConfig * Turn FinalityProof into a struct * Add some inline TODOs for follow up PRs * Remove test which enacted multiple changes This should be added back at some later point in time, but right now the code doesn't allow for this behaviour. * Use `contains_key` when checking for header This is better than using `get().is_some()` since we skip decoding the storage value * Use initial hash when updating best_finalized * Add better checks around enacting scheduled changes * Rename finality related functions * Appease Clippy

## [0.8.3] - 2024-12-03 This release includes two fixes for small memory leaks on edge-cases in the notification and request-response protocols. ### Fixed - req-resp: Fix memory leak of pending substreams ([#297](paritytech/litep2p#297)) - notification: Fix memory leak of pending substreams ([#296](paritytech/litep2p#296)) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <[email protected]>

## [0.8.3] - 2024-12-03 This release includes two fixes for small memory leaks on edge-cases in the notification and request-response protocols. ### Fixed - req-resp: Fix memory leak of pending substreams ([paritytech#297](paritytech/litep2p#297)) - notification: Fix memory leak of pending substreams ([paritytech#296](paritytech/litep2p#296)) cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <[email protected]>

kianenigma added the J1-meta label Nov 18, 2021

kianenigma mentioned this issue Jan 13, 2022

Remove all stale on_runtime_upgrade hooks in the runtime paritytech/substrate#10650

Merged

kianenigma mentioned this issue Oct 14, 2022

EPM: allow duplicate submissions paritytech/substrate#12237

Merged

2 tasks

kianenigma mentioned this issue Jan 8, 2023

[Feature] StakeTracker P1 - VotersList paritytech/substrate#13079

Closed

juangirini added the T1-runtime label Jun 7, 2023

juangirini transferred this issue from paritytech/substrate Aug 24, 2023

the-right-joyce added I6-meta A specific issue for grouping tasks or bugs of a specific category. T1-FRAME This PR/Issue is related to core FRAME, the framework. and removed J1-meta labels Aug 25, 2023

claravanstaden pushed a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023

Fix relayer tests (paritytech#296)

ec687d9

helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024

Rpc small fixes (paritytech#296)

8ce032e

* Fix receipt's log transaction_hash * Block's total_difficulty default to 0 * Update ts-test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce migrations to be standalone scripts #296

Enforce migrations to be standalone scripts #296

kianenigma commented Nov 18, 2021 •

edited

Loading

KiChjang commented Jan 19, 2022

kianenigma commented Jan 19, 2022

ruseinov commented Jan 10, 2023

bkchr commented Jan 10, 2023

ggwpez commented Jan 10, 2023 •

edited

Loading

ruseinov commented Jan 10, 2023

ruseinov commented Jan 10, 2023

ruseinov commented Jan 11, 2023

kianenigma commented Jan 15, 2023 •

edited

Loading

athei commented Jan 17, 2023 •

edited

Loading

ruseinov commented Feb 13, 2023

Enforce migrations to be standalone scripts #296

Enforce migrations to be standalone scripts #296

Comments

kianenigma commented Nov 18, 2021 • edited Loading

KiChjang commented Jan 19, 2022

kianenigma commented Jan 19, 2022

ruseinov commented Jan 10, 2023

bkchr commented Jan 10, 2023

ggwpez commented Jan 10, 2023 • edited Loading

ruseinov commented Jan 10, 2023

ruseinov commented Jan 10, 2023

ruseinov commented Jan 11, 2023

kianenigma commented Jan 15, 2023 • edited Loading

athei commented Jan 17, 2023 • edited Loading

ruseinov commented Feb 13, 2023

kianenigma commented Nov 18, 2021 •

edited

Loading

ggwpez commented Jan 10, 2023 •

edited

Loading

kianenigma commented Jan 15, 2023 •

edited

Loading

athei commented Jan 17, 2023 •

edited

Loading