Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parachain-system: send core selector ump signal #5888

Merged
merged 11 commits into from
Oct 7, 2024
7 changes: 4 additions & 3 deletions cumulus/client/consensus/aura/src/collators/lookahead.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,9 @@ use cumulus_client_collator::service::ServiceInterface as CollatorServiceInterfa
use cumulus_client_consensus_common::{self as consensus_common, ParachainBlockImportMarker};
use cumulus_client_consensus_proposer::ProposerInterface;
use cumulus_primitives_aura::AuraUnincludedSegmentApi;
use cumulus_primitives_core::{ClaimQueueOffset, CollectCollationInfo, PersistedValidationData};
use cumulus_primitives_core::{
ClaimQueueOffset, CollectCollationInfo, PersistedValidationData, DEFAULT_CLAIM_QUEUE_OFFSET,
};
use cumulus_relay_chain_interface::RelayChainInterface;

use polkadot_node_primitives::{PoV, SubmitCollationParams};
Expand Down Expand Up @@ -260,8 +262,7 @@ where
relay_parent,
params.para_id,
&mut params.relay_client,
// Use depth 0, to preserve behaviour.
ClaimQueueOffset(0),
ClaimQueueOffset(DEFAULT_CLAIM_QUEUE_OFFSET),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we want to use 0 to preserve old behaviour ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem with using 0 is this:

we have a DEFAULT_CLAIM_QUEUE_OFFSET of 1 (as you also suggested).

The default implementation of the SelectCore trait uses DEFAULT_CLAIM_QUEUE_OFFSET.

if we instead use here offset 0 and the runtime is updated to use offset 1, there will be a discrepancy and the collator will not advertise to the right core. we need these two places to use the same offset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation makes sense to me, offset of 1 should also not break anything here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative would be to set the DEFAULT_CLAIM_QUEUE_OFFSET to 0 and have some kind of LookaheadCoreSelector impl of SelectCore that uses a claim queue offset of 1.

the current state of the PR (with lookahead using offset 1) breaks the coretime-shared-core zombienet test because core sharing doesn't work between paras if one of the paras is not producing blocks. will be fixed by #5461

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think using a default claim queue offset of 0 is a must, because if the UMP signal is not present, the runtime will assume offset 0: #5423

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and another reason to use default offset 0 is because the claim queue size is at least 1 (so an offset of 1 will not work on a network that is configured with size 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an alternative would be to set the DEFAULT_CLAIM_QUEUE_OFFSET to 0 and have some kind of LookaheadCoreSelector impl of SelectCore that uses a claim queue offset of 1.

I've implemented this now

)
.await
.get(0)
Expand Down
2 changes: 2 additions & 0 deletions cumulus/pallets/parachain-system/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,5 @@ try-runtime = [
"polkadot-runtime-parachains/try-runtime",
"sp-runtime/try-runtime",
]

experimental-ump-signals = []
53 changes: 49 additions & 4 deletions cumulus/pallets/parachain-system/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -193,20 +193,46 @@ pub mod ump_constants {

/// Trait for selecting the next core to build the candidate for.
pub trait SelectCore {
fn select_core_for_child() -> (CoreSelector, ClaimQueueOffset);
/// Core selector information for the current block.
fn selected_core() -> (CoreSelector, ClaimQueueOffset);
/// Core selector information for the next block.
fn select_next_core() -> (CoreSelector, ClaimQueueOffset);
}

/// The default core selection policy.
pub struct DefaultCoreSelector<T>(PhantomData<T>);

impl<T: frame_system::Config> SelectCore for DefaultCoreSelector<T> {
fn select_core_for_child() -> (CoreSelector, ClaimQueueOffset) {
fn selected_core() -> (CoreSelector, ClaimQueueOffset) {
let core_selector: U256 = frame_system::Pallet::<T>::block_number().into();

(CoreSelector(core_selector.byte(0)), ClaimQueueOffset(DEFAULT_CLAIM_QUEUE_OFFSET))
}

fn select_next_core() -> (CoreSelector, ClaimQueueOffset) {
let core_selector: U256 = (frame_system::Pallet::<T>::block_number() + One::one()).into();

(CoreSelector(core_selector.byte(0)), ClaimQueueOffset(DEFAULT_CLAIM_QUEUE_OFFSET))
}
}

/// Core selection policy that builds on claim queue offset 1.
pub struct LookaheadCoreSelector<T>(PhantomData<T>);

impl<T: frame_system::Config> SelectCore for LookaheadCoreSelector<T> {
fn selected_core() -> (CoreSelector, ClaimQueueOffset) {
let core_selector: U256 = frame_system::Pallet::<T>::block_number().into();

(CoreSelector(core_selector.byte(0)), ClaimQueueOffset(1))
}

fn select_next_core() -> (CoreSelector, ClaimQueueOffset) {
let core_selector: U256 = (frame_system::Pallet::<T>::block_number() + One::one()).into();

(CoreSelector(core_selector.byte(0)), ClaimQueueOffset(1))
}
}

#[frame_support::pallet]
pub mod pallet {
use super::*;
Expand Down Expand Up @@ -365,6 +391,11 @@ pub mod pallet {
UpwardMessages::<T>::put(&up[..num as usize]);
*up = up.split_off(num as usize);

// Send the core selector UMP signal. This is experimental until relay chain
// validators are upgraded to handle ump signals.
#[cfg(feature = "experimental-ump-signals")]
Self::send_ump_signal();

// If the total size of the pending messages is less than the threshold,
// we decrease the fee factor, since the queue is less congested.
// This makes delivery of new messages cheaper.
Expand Down Expand Up @@ -1397,9 +1428,9 @@ impl<T: Config> Pallet<T> {
}
}

/// Returns the core selector.
/// Returns the core selector for the next block.
pub fn core_selector() -> (CoreSelector, ClaimQueueOffset) {
T::SelectCore::select_core_for_child()
T::SelectCore::select_next_core()
}

/// Set a custom head data that should be returned as result of `validate_block`.
Expand All @@ -1418,6 +1449,20 @@ impl<T: Config> Pallet<T> {
CustomValidationHeadData::<T>::put(head_data);
}

/// Send the ump signals
#[cfg(feature = "experimental-ump-signals")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it would be a good idea to hide the `select_core_for_child runtime API behind this feature ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it helps us. The parachain can implement the API without any problem. I introduced the compile-time feature because otherwise the relay chain would have rejected these candidates

fn send_ump_signal() {
use cumulus_primitives_core::relay_chain::vstaging::{UMPSignal, UMP_SEPARATOR};

UpwardMessages::<T>::mutate(|up| {
up.push(UMP_SEPARATOR);

// Send the core selector signal.
let core_selector = T::SelectCore::selected_core();
up.push(UMPSignal::SelectCore(core_selector.0, core_selector.1).encode());
});
}

/// Open HRMP channel for using it in benchmarks or tests.
///
/// The caller assumes that the pallet will accept regular outbound message to the sibling
Expand Down
2 changes: 1 addition & 1 deletion cumulus/primitives/core/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@ pub mod rpsr_digest {

/// The default claim queue offset to be used if it's not configured/accessible in the parachain
/// runtime
pub const DEFAULT_CLAIM_QUEUE_OFFSET: u8 = 1;
pub const DEFAULT_CLAIM_QUEUE_OFFSET: u8 = 0;

/// Information about a collation.
///
Expand Down
16 changes: 16 additions & 0 deletions prdoc/pr_5888.prdoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
title: 'parachain-system: send core selector ump signal'

doc:
- audience: Runtime Dev
description: |
Send the core selector ump signal in cumulus. Guarded by a compile time feature called `experimental-ump-signals`
until nodes are upgraded to a version that includes https://github.com/paritytech/polkadot-sdk/pull/5423 for
gracefully handling ump signals.

crates:
- name: cumulus-client-consensus-aura
bump: minor
- name: cumulus-pallet-parachain-system
bump: major
- name: cumulus-primitives-core
bump: minor