Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine-tune unified scheduler loops by select_biased #1437

Merged
merged 1 commit into from
May 22, 2024

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented May 20, 2024

Problem

unified scheduler is pretending to use the biased selection for its optimum performance.

Summary of Changes

Now that select_biased! is available (#1434 ); let's use that.

perf numbers

about 12%-13% increase can be attained in an ideal condition with a caveat:

before
ledger processed in 1 second, 370 ms
ledger processed in 1 second, 323 ms
ledger processed in 1 second, 399 ms
after
ledger processed in 1 second, 200 ms
ledger processed in 1 second, 202 ms
ledger processed in 1 second, 243 ms

Here's the caveat: unlike previous optimization prs (#627, #1037, #1192 (comment), #1197 (comment) and #1250 (comment) <= this is not merged yet), note that the perf gain is small in the present day of the real word, to be honest. I was actually forced to compromise here. ;) The above benchmark is conducted with the synthesized settings of both busy looping and nop handler like this:

diff --git a/unified-scheduler-pool/src/lib.rs b/unified-scheduler-pool/src/lib.rs
index 348b6bb038..b7cfdb0be1 100644
--- a/unified-scheduler-pool/src/lib.rs
+++ b/unified-scheduler-pool/src/lib.rs
@@ -236,23 +236,6 @@ impl TaskHandler for DefaultTaskHandler {
         index: usize,
         handler_context: &HandlerContext,
     ) {
-        // scheduler must properly prevent conflicting tx executions. thus, task handler isn't
-        // responsible for locking.
-        let batch = bank.prepare_unlocked_batch_from_single_tx(transaction);
-        let batch_with_indexes = TransactionBatchWithIndexes {
-            batch,
-            transaction_indexes: vec![index],
-        };
-
-        *result = execute_batch(
-            &batch_with_indexes,
-            bank,
-            handler_context.transaction_status_sender.as_ref(),
-            handler_context.replay_vote_sender.as_ref(),
-            timings,
-            handler_context.log_messages_bytes_limit,
-            &handler_context.prioritization_fee_cache,
-        );
     }
 }
 
@@ -802,6 +785,9 @@ impl<S: SpawnableScheduler<TH>, TH: TaskHandler> ThreadManager<S, TH> {
                                 state_machine.deschedule_task(&executed_task.task);
                                 Self::accumulate_result_with_timings(&mut result_with_timings, executed_task);
                             },
+                            default => {
+                                continue;
+                            },
                         };
 
                         is_finished = session_ending && state_machine.has_no_active_task();
@@ -843,6 +829,9 @@ impl<S: SpawnableScheduler<TH>, TH: TaskHandler> ThreadManager<S, TH> {
                             continue;
                         }
                     },
+                    default => {
+                        continue;
+                    },
                 };
                 let mut task = ExecutedTask::new_boxed(task);
                 Self::execute_task_with_handler(

So, this explains the exceeding time reduction compared to previous results.

However, I believe the above result will still stands by itself and this pr can be justified. That's because:

  1. the result aligns with theoretical understanding.
  2. the actual locking pattern (note that scheduling (locking & dep-graph-ing) is now dominant processing) is realistic because the same usual dataset is used like previous benchmarks.
  3. the code change is quite small (diff is +3 lines -3 lines sans outdated comment removal). (no additional complexity and risk).
  4. the underlying reason for the need of synthesized benchmark to show the improvement is our current mainnet-beta blocks are tooooo sparse and execute_batch is toooo slow. both will be addressed in the future.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.7%. Comparing base (94af1aa) to head (b353804).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1437   +/-   ##
=======================================
  Coverage    82.7%    82.7%           
=======================================
  Files         872      872           
  Lines      370361   370354    -7     
=======================================
+ Hits       306478   306538   +60     
+ Misses      63883    63816   -67     

@ryoqun ryoqun marked this pull request as ready for review May 21, 2024 07:07
@ryoqun ryoqun requested a review from apfitzge May 21, 2024 07:07
@ryoqun ryoqun changed the title [wip] Fine-tune unified scheduler loops by select_biased Fine-tune unified scheduler loops by select_biased May 21, 2024
Copy link

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - think adding biased selection makes sense for prioritization of unblocking completed tasks.

@ryoqun ryoqun merged commit e227d25 into anza-xyz:master May 22, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants