-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove recursive read locking of program cache #1616
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved to fix the immediate bug, however TransactionBatchProcessor::get_environments_for_epoch()
containing a hidden read lock remains a footgun.
The idea was to contain the exposure of program cache within SVM bounds. So that the user of SVM are agnostic of program cache operations. We can refactor it in a different way to address the footgun. |
Good point. We could shift signature of this function to force the caller to obtain the lock: /// Returns the current environments depending on the given epoch
pub fn get_environments_for_epoch(program_cache: &ProgramCache, epoch: Epoch) -> ProgramRuntimeEnvironments {
program_cache
.get_environments_for_epoch(epoch)
}
My proposal would seemingly be in disagreement with this goal tho ... I would lean towards safety over convenience tho |
We can change it to try lock, and return the option to the caller. |
That seems reasonable to me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I tested the patch on the node that was reliably dead-locking, and this patch avoid the deadlock.
Also, a little more detail from our debug thread about why #1582 introduced the deadlock
As of Rust v1.62.0, writers are preferred in reader/writer contention scenarios to avoid starvation. See rust-lang/rust#95801
but modified to prefer writers and spin before sleeping
...
It always prefers writers, as we decided rust-lang/rust#93740 (comment).
Problem
The transaction processor is read locking program cache recursively.
Summary of Changes
Use the existing read locked program cache instance, instead of locking it again.
Fixes #