Fix algorithmic complexity of on-demand scheduler with regards to number of cores. #3190

eskimor · 2024-02-02T12:28:27Z

We witnessed really poor performance on Rococo, where we ended up with 50 on-demand cores. This was due to the fact that for each core the full queue was processed. With this change full queue processing will happen way less often (most of the time complexity is O(1) or O(log(n))) and if it happens then only for one core (in expectation).

Also spot price is now updated before each order to ensure economic back pressure.

TODO:

Implement
Basic tests
Add more tests (see todos)
Run benchmark to confirm better performance, first results suggest > 100x faster.
Write migrations
Bump scale-info version and remove patch in Cargo.toml
Write PR docs: on-demand performance improved, more on-demand cores are now non problematic anymore. If need by also the max queue size can be increased again. (Maybe not to 10k)

Optional: Performance can be improved even more, if we called pop_assignment_for_core(), before calling report_processed (Avoid needless affinity drops). The effect gets smaller the larger the claim queue and I would only go for it, if it does not add complexity to the scheduler.

Does not yet typecheck.

polkadot/runtime/parachains/src/assigner_on_demand/mod.rs

ordian · 2024-02-02T15:48:53Z

New profile looks good 🚀

polkadot/runtime/parachains/src/assigner_on_demand/mod.rs

Reintroduces on_initialize hook to update the spot traffic even when there are no on demand orders being placed, allowing for a price decrease. Add Ord tests for QueueIndex and Reverse

eskimor

Nice work, thanks @antonva !

polkadot/runtime/parachains/src/assigner_on_demand/migration.rs

polkadot/runtime/parachains/src/assigner_on_demand/tests.rs

2.11.0 Implements TypeInfo for BinaryHeap (#200). which is needed for #3190

command-bot · 2024-03-18T13:50:43Z

@antonva Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=rococo --target_dir=polkadot --pallet=runtime_parachains::assigner_on_demand has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5559236 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5559236/artifacts/download.

antonva · 2024-03-18T14:28:51Z

bot bench polkadot-pallet --runtime=westend --pallet=runtime_parachains::assigner_on_demand

command-bot · 2024-03-18T14:28:56Z

@antonva https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5561742 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=westend --target_dir=polkadot --pallet=runtime_parachains::assigner_on_demand. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 1-1a2e10ad-a1ef-48dc-a0d4-abb401a78939 to cancel this command or bot cancel to cancel all commands in this pull request.

Co-authored-by: ordian <[email protected]>

…=westend --target_dir=polkadot --pallet=runtime_parachains::assigner_on_demand

command-bot · 2024-03-18T15:20:19Z

@antonva Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=westend --target_dir=polkadot --pallet=runtime_parachains::assigner_on_demand has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5561742 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5561742/artifacts/download.

ordian

Approving modulo nits, but I'm not deeply familiar with the underlying logic of the on_demand assigner

polkadot/runtime/parachains/src/assigner_on_demand/migration.rs

ordian · 2024-03-19T04:53:01Z

polkadot/runtime/parachains/src/assigner_on_demand/mod.rs

+/// The actual queue is implemented via multiple priority queues. One for each core, for entries
+/// which currently have a core affinity and one free queue, with entries without any affinity yet.
+///
+/// The design aims to have most queue accessess be O(1) or O(log(N)). Absolute worst case is O(N).


a note on complexity: when you have a map like AffinityEntries from core index to BinaryHeap, the read and write to it would still be O(len(BinaryHeap)). Having BinaryHeap instead of a Vec only saves on in-memory operations, but I assume these will be dominated by the actual db writes/reads + allocation + encoding/decoding + hashing unless you have a lot of in-memory ops for one read/write

You mean, because the data still needs to be fetched? Yes indeed, I considered this, but decided to not make it part of the complexity analysis, because the fetching of data is unavoidable and not really part of the algorithm. I was also hoping that the batch data fetching is fast. In any case, we are also usually fetching less data than before. I was tempted to optimize not accessing the free list, in those cases where we don't need data from there, by keeping track of its status, but decided the complexity is not worth it.

ordian · 2024-03-19T09:38:23Z

polkadot/runtime/parachains/src/assigner_on_demand/mod.rs

+		#[cfg(not(test))]
+		debug_assert_ne!(
+			affinity, None,
+			"Decreased affinity for a para that has not been served on a core?"
+		);


first time seeing debug assert for non test environment. is the intention to run this assert on testnets and benchmarks only?

I guess it breaks some edge case tests?

paritytech-cicd-pr · 2024-03-20T09:00:03Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 3/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5581597

polkadot/runtime/westend/src/weights/runtime_parachains_assigner_on_demand.rs

…ber of cores. (paritytech#3190) We witnessed really poor performance on Rococo, where we ended up with 50 on-demand cores. This was due to the fact that for each core the full queue was processed. With this change full queue processing will happen way less often (most of the time complexity is O(1) or O(log(n))) and if it happens then only for one core (in expectation). Also spot price is now updated before each order to ensure economic back pressure. TODO: - [x] Implement - [x] Basic tests - [x] Add more tests (see todos) - [x] Run benchmark to confirm better performance, first results suggest > 100x faster. - [x] Write migrations - [x] Bump scale-info version and remove patch in Cargo.toml - [x] Write PR docs: on-demand performance improved, more on-demand cores are now non problematic anymore. If need by also the max queue size can be increased again. (Maybe not to 10k) Optional: Performance can be improved even more, if we called `pop_assignment_for_core()`, before calling `report_processed` (Avoid needless affinity drops). The effect gets smaller the larger the claim queue and I would only go for it, if it does not add complexity to the scheduler. --------- Co-authored-by: eskimor <[email protected]> Co-authored-by: antonva <[email protected]> Co-authored-by: command-bot <> Co-authored-by: Anton Vilhelm Ásgeirsson <[email protected]> Co-authored-by: ordian <[email protected]>

eskimor added 11 commits January 29, 2024 10:49

max -> min

fd41678

Reduce default queue size on-demand

9ee7aa1

Introduce max max queue size.

0b3c03d

Use max max queue size.

edad3a4

Implementation complete

ee74614

Does not yet typecheck.

Revert min/max fix.

16afcfa

Fixes + EncodeableBinaryHeap.

c750146

Remove EncodeableBinaryHeap

b0edbcf

Patch scale-info for now.

cac2752

Revert default value.

a9b13f7

Fixes + tests.

f9ea4e4

ordian reviewed Feb 2, 2024

View reviewed changes

polkadot/runtime/parachains/src/assigner_on_demand/mod.rs Outdated Show resolved Hide resolved

eskimor added 3 commits February 2, 2024 14:13

Bring back copyright.

457b225

Fix benchmark.

ce7c8e8

binary heap got merged

1bab42e

ordian linked an issue Feb 12, 2024 that may be closed by this pull request

Large OnDemandQueue queue processing leads to up to 10s block import times #3051

Closed

ordian reviewed Feb 21, 2024

View reviewed changes

polkadot/runtime/parachains/src/assigner_on_demand/mod.rs Outdated Show resolved Hide resolved

antonva added 6 commits February 29, 2024 13:33

Calculate on demand traffic on idle blocks

c21540d

Reintroduces on_initialize hook to update the spot traffic even when there are no on demand orders being placed, allowing for a price decrease. Add Ord tests for QueueIndex and Reverse

Add migration for on demand provider

999b7bb

Merge branch 'master' into rk-on-demand-perf-proper-fix

4df8590

Readd missing export

ecbae81

Add storage version to on demand pallet

079c78f

Merge branch 'master' into rk-on-demand-perf-proper-fix

ee91d8b

eskimor commented Mar 7, 2024

View reviewed changes

antonva added a commit that referenced this pull request Mar 13, 2024

Bump scale-info to 2.11.0

cabbc69

2.11.0 Implements TypeInfo for BinaryHeap (#200). which is needed for #3190

antonva mentioned this pull request Mar 13, 2024

Bump scale-info to 2.11.0 #3682

Closed

24 tasks

antonva added 3 commits March 14, 2024 10:42

Address comments, add new scale-info

5308f3a

Merge branch 'master' into rk-on-demand-perf-proper-fix

2253a14

Bump scale-info version again

1ae8b7b

antonva added 4 commits March 18, 2024 14:04

Fix post_upgrade in migration

ee29cc2

Add prdoc

b344bc6

Remove unused on-demand max size import

76e4e76

Remove unused mut from test

ab51045

eskimor marked this pull request as ready for review March 18, 2024 14:16

Remove benchmark todo

9a705cb

Simplify PartialOrd for EnqueuedOrder

f2ad19f

Co-authored-by: ordian <[email protected]>

antonva approved these changes Mar 18, 2024

View reviewed changes

".git/.scripts/commands/bench/bench.sh" --subcommand=pallet --runtime…

70f468e

…=westend --target_dir=polkadot --pallet=runtime_parachains::assigner_on_demand

ordian self-requested a review March 18, 2024 15:36

ordian approved these changes Mar 19, 2024

View reviewed changes

Address nits

d8f1543

Type sums in post migration

be3a3c1

eskimor commented Mar 20, 2024

View reviewed changes

polkadot/runtime/westend/src/weights/runtime_parachains_assigner_on_demand.rs Show resolved Hide resolved

antonva added this pull request to the merge queue Mar 20, 2024

antonva added the T8-polkadot This PR/Issue is related to/affects the Polkadot network. label Mar 20, 2024

Merged via the queue into master with commit b74353d Mar 20, 2024
132 of 137 checks passed

antonva deleted the rk-on-demand-perf-proper-fix branch March 20, 2024 14:29

Morganamilo mentioned this pull request Apr 4, 2024

PRDoc crate semver #3984

Closed

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix algorithmic complexity of on-demand scheduler with regards to number of cores. #3190

Fix algorithmic complexity of on-demand scheduler with regards to number of cores. #3190

eskimor commented Feb 2, 2024 •

edited by antonva

Loading

ordian commented Feb 2, 2024 •

edited

Loading

eskimor left a comment

command-bot bot commented Mar 18, 2024

antonva commented Mar 18, 2024

command-bot bot commented Mar 18, 2024 •

edited

Loading

command-bot bot commented Mar 18, 2024

ordian left a comment

ordian Mar 19, 2024

eskimor Mar 19, 2024

ordian Mar 19, 2024

eskimor Mar 20, 2024

paritytech-cicd-pr commented Mar 20, 2024

Fix algorithmic complexity of on-demand scheduler with regards to number of cores. #3190

Fix algorithmic complexity of on-demand scheduler with regards to number of cores. #3190

Conversation

eskimor commented Feb 2, 2024 • edited by antonva Loading

ordian commented Feb 2, 2024 • edited Loading

eskimor left a comment

Choose a reason for hiding this comment

command-bot bot commented Mar 18, 2024

antonva commented Mar 18, 2024

command-bot bot commented Mar 18, 2024 • edited Loading

command-bot bot commented Mar 18, 2024

ordian left a comment

Choose a reason for hiding this comment

ordian Mar 19, 2024

Choose a reason for hiding this comment

eskimor Mar 19, 2024

Choose a reason for hiding this comment

ordian Mar 19, 2024

Choose a reason for hiding this comment

eskimor Mar 20, 2024

Choose a reason for hiding this comment

paritytech-cicd-pr commented Mar 20, 2024

eskimor commented Feb 2, 2024 •

edited by antonva

Loading

ordian commented Feb 2, 2024 •

edited

Loading

command-bot bot commented Mar 18, 2024 •

edited

Loading