-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define InstalledScheduler::wait_for_termination() #33922
Conversation
6e83529
to
565e7ce
Compare
565e7ce
to
db50d2c
Compare
@@ -4185,7 +4186,11 @@ impl Bank { | |||
/// Register a new recent blockhash in the bank's recent blockhash queue. Called when a bank | |||
/// reaches its max tick height. Can be called by tests to get new blockhashes for transaction | |||
/// processing without advancing to a new bank slot. | |||
pub fn register_recent_blockhash(&self, blockhash: &Hash) { | |||
fn register_recent_blockhash(&self, blockhash: &Hash, scheduler: &InstalledSchedulerRwLock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub
isn't needed anymore as of #33880..
also, this internal fn should ever be called by register_tick
. even more so after this new scheduler
arg. in other words, blocking must be done if needed with proper scheduler
plumbing.
.as_mut() | ||
.and_then(|scheduler| scheduler.wait_for_termination(&reason)); | ||
if !reason.is_paused() { | ||
drop(scheduler.take().expect("scheduler after waiting")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this meaningless drop()
and .take()
will be replaced with SchedulerPool::return_to_pool()
by a later pr.
// This is needed because recent_blockhash updates necessitate synchronizations for | ||
// consistent tx check_age handling. | ||
BankWithScheduler::wait_for_paused_scheduler(self, scheduler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is indeed very subtle yet crucial. there's tests for this: test_scheduler_schedule_execution_recent_blockhash_edge_case_{with,without}_race
in #33070
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm trying to just summarize the reason this is necessary.
Essentially, in the loop in blcokstore_processor.rs, scheduling no longer blocks our transaction processing. This means we will end up scheduling until we hit the final tick, and which point we register the new bank hash.
We need to make sure the blockhash_queue of our bank is not updated until after all the scheduled transactions have completed, because if a tx w/ recent blockhash equal to the current blockhash was scheduled we'd get inconsistent results based on when it was executed not scheduled.
I think that is correct, but please let me know if it is not.
If that is the case, why did we choose this approach instead of checking blockhash at schedule time instead?
afaik blockhash queue is only updated at the end of a block, i.e. when we should not be scheduling any more txs for that slot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially, in the loop in blcokstore_processor.rs, scheduling no longer blocks our transaction processing. This means we will end up scheduling until we hit the final tick, and which point we register the new bank hash.
We need to make sure the blockhash_queue of our bank is not updated until after all the scheduled transactions have completed, because if a tx w/ recent blockhash equal to the current blockhash was scheduled we'd get inconsistent results based on when it was executed not scheduled.
that's correct understanding.
If that is the case, why did we choose this approach instead of checking blockhash at schedule time instead?
afaik blockhash queue is only updated at the end of a block, i.e. when we should not be scheduling any more txs for that slot.
there's two reasons:
- impl simplify and risk reduction (i.e. extracting
Bank::check_age()
is too dangerous without some rigourious code audit, which i want to avoid (esp considering nonces). Certainly, this is nice optimization opportunity. maybe future work). - the recent blockhashes are exposed to runtime as a sysvar (
SysvarRecentB1ockHashes11111111111111111111
) in theory. In theory here means this isn't the case due to the existence ofSysvarCache
, which is updated only at the start of slot. However, I don't want to rely on some peculiar behavior of the cache not to be fragile. Again, this is fixable, but this will be of (distant) future work).
Alternatively, we can route the recentblockhash update via the scheduler like this:
enum SchedulerTask {
ExecuteTransaction(txes...),
UpdateRecentBlockhash(hash),
}
so that this particular blocking call can be consolidated into replaystage's wait_for_completed_scheduler()
. but again, this is out-of-scope for now due to the need of auditing any potential affected access timings to the bank's recent blockhashes and its implications.
Codecov Report
@@ Coverage Diff @@
## master #33922 +/- ##
=========================================
- Coverage 81.9% 81.9% -0.1%
=========================================
Files 809 809
Lines 218060 218241 +181
=========================================
+ Hits 178630 178769 +139
- Misses 39430 39472 +42 |
pub type ResultWithTimings = (Result<()>, ExecuteTimings); | ||
|
||
#[derive(Debug, PartialEq, Eq, Clone, Copy)] | ||
pub enum WaitReason { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also comment on this from the call-side perspective? It's unclear to me, when I call wait_for_terminate
with one of these reasons...is that me telling the scheduler to terminate with that reason, or is it just waiting for the reason to happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did my best: 507c502
ff1a20a
to
507c502
Compare
@apfitzge thanks for pointing out right questions. hope my newly added in-source comments make your confusion vanish... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy with the changes. Tests are very clear and easy to follow with the mocked scheduler behavior.
Problem
The upcoming unified scheduler is async by nature. so, the replay stage must eventually be blocked on block boundaries in some way for tx execution completion. however, the minimal
InstalledSchduler
(introduced at #33875) doesn't provide any such way.Summary of Changes
Implement it and wire it exhaustively into all the needed relevant blocking points.
(fyi, planned next prs: 1. introduce
InstallSchedulerPool
2. introduceSchedulerPool
as its actual impl)extracted from: #33070