-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add randomness to pallet-collator-assignment #267
Conversation
key to use: |
Master coverage: 70.58% |
And add unrelated test
Breaks zombienet tests because the collator2002 is not assigned to container chain 2002
Because a different collator may be assigned to container 2002
There was a bug because we tried to read an Option<Hash> as a Hash, and this results in the first byte of the [u8; 32] being always 1, which is the marker for "Some"
Because since #288 collator rotation is handled in on_initialize, and we can only read the relay randomness after the set_validation_data inherent, which happens after on_initialize. The solution is to read the randomness on the on_finalize hook of the previous block, store it in pallet_collator_assignment, and delete it when reading it.
Rotating every session does not leave enough time for collators to detect their assignment, and some tests result in timeout
/// Randomness from previous block. Used to shuffle collators on session change. | ||
/// Should only be set on the last block of each session and should be killed on the on_initialize of the next block. | ||
/// The default value of [0; 32] disables randomness in the pallet. | ||
#[pallet::storage] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I like this, I am also missing a way to disable it. This probably cannot serve as a kill switch, as it is set on_finalize of the last block before the session, so we would need to set it 0 on that block.
I think we can have an additional storage that serves as s kill switch, any opposition for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ignore my comment, I forgot that we can do this by putting a very large rotation period
runtime/dancebox/src/migrations.rs
Outdated
fn migrate(&self, _available_weight: Weight) -> Weight { | ||
log::info!(target: LOG_TARGET, "migrate"); | ||
|
||
Configuration::set_full_rotation_period(RuntimeOrigin::root(), 24u32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think this is going to work. set full rotation period schedules the configuration change, but wont be applied until after 2 sessions. During those two sessions the configuration wont be readible in the system, as the metadata is not the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to read the existing values in currentConfig and pendingConfigs, and modify so that it accepts the new structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some examples in paritytech/polkadot-sdk@3bbb336#diff-39d3b12d624dbf61df8f5ea6fc18a9f12611347f4d730388d6a96deafbba5a71, although I am not sure they are super clear. I think its better to just read whatever value it exists there with the previous struct format, and you just add the new parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR is mergable provided that we mofidy the migration, good job!
runtime/dancebox/src/migrations.rs
Outdated
/// Run a standard pre-runtime test. This works the same way as in a normal runtime upgrade. | ||
#[cfg(feature = "try-runtime")] | ||
fn pre_upgrade(&self) -> Result<Vec<u8>, sp_runtime::DispatchError> { | ||
let zero_period = Configuration::config().full_rotation_period; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should indeed panic with try-runtime no? the config does not still have full_rotation_period
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming it defaults to 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case what happens is that Configuration::config()
returns the default value, I am assuming because it is a ValueQuery. The default value has full_rotation_period = 24, so this test was wrong.
Add config param to pallet_collator_assigment that triggers a full rotation of all the assigned collators.
Full rotation is only triggered every 24 sessions (configurable). For the other sessions we keep the old behavior, except that the collators and the container chain lists are shuffled, so container chains with a low para id no longer have priority.
Collators may still rotate on normal sessions in the same cases as before this PR: if another collator leaves, or if a container chain joins or leaves.
The random shuffle does not happen in session 0, and it doesn't happen in rust integration tests unless the random seed is explicitly set. Zombienet tests use real randomness from the relay and there is no way to mock that randomness, so collator assignment will change every time these tests are run. Most of the tests are unit tests because there it is easier to control the randomness.
The full_rotation_period is configurable in pallet_configuration, the default value is 24 sessions but it can be changed. A value of 0 disables collator rotation.