-
Notifications
You must be signed in to change notification settings - Fork 107
fix(concurrency): stopping the program if one of the threads panics #2089
fix(concurrency): stopping the program if one of the threads panics #2089
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion
crates/blockifier/src/blockifier/transaction_executor.rs
line 236 at r1 (raw file):
// Change the defult way of handling panics to write to stderr. // this ensure that the panic massge and location will be printed to stderr. // without adding flags to the cargo command.
The upside of this method is that I can print both the location and the error message to stderr without adding a flag.
The downside is that every panic that occurs after this will have this behavior. I think I can solve this if I save the original hook and set it back after all the threads finish their execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions
crates/blockifier/src/blockifier/transaction_executor.rs
line 250 at r1 (raw file):
for _ in 0..self.config.concurrency_config.n_workers { let worker_executor = Arc::clone(&worker_executor); s.spawn(move || {
If there are future memory leaks, it might be a place to look at because I am not sure we release the memory of the running threads
when aborting the process.
1ad44db
to
42e719e
Compare
42e719e
to
b555dda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 2 files reviewed, 3 unresolved discussions (waiting on @avi-starkware, @noaov1, and @Yoni-Starkware)
crates/blockifier/src/concurrency/worker_logic.rs
line 120 at r3 (raw file):
fn execute(&self, tx_index: TxIndex) { self.execute_tx(tx_index);
Please ignore. It is here to check the Python test. I will remove this when I am done.
7a3ffd6
to
234757c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @avi-starkware, @meship-starkware, and @noaov1)
crates/blockifier/src/concurrency/worker_logic.rs
line 89 at r5 (raw file):
if self.scheduler.done() { return; };
What are we gaining here?
Code quote:
if self.scheduler.done() {
return;
};
234757c
to
b1b86eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 1 unresolved discussion (waiting on @avi-starkware, @noaov1, and @Yoni-Starkware)
crates/blockifier/src/concurrency/worker_logic.rs
line 89 at r5 (raw file):
Previously, Yoni-Starkware (Yoni) wrote…
What are we gaining here?
No new threads will take tasks after the program is crushed.
Previously, meship-starkware (Meshi Peled) wrote…
This is not needed. |
b1b86eb
to
8206ffa
Compare
Suggestion: fn done(&self) -> bool {
self.done_marker.load(Ordering::Acquire)
}
pub fn halt(&self) {
self.done_marker.store(true, Ordering::Release); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 2 unresolved discussions (waiting on @avi-starkware, @noaov1, and @Yoni-Starkware)
crates/blockifier/src/concurrency/worker_logic.rs
line 89 at r5 (raw file):
Previously, avi-starkware wrote…
This is not needed.
If the done marker is set to true, then all workers will stop when they next ask for a new task.
Entering the commit phase is also not a problem, becausetry_commit
will returnNone
ifdone_marker
is true.
Done.
8206ffa
to
01f4ae5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 4 files reviewed, 2 unresolved discussions (waiting on @avi-starkware, @noaov1, and @Yoni-Starkware)
crates/blockifier/src/concurrency/scheduler.rs
line 239 at r5 (raw file):
Previously, avi-starkware wrote…
done
is an implementation detail of the scheduler. Not part of its interface. please revert to private method.
The methodhalt
should be used to prematurely stop the scheduler.
Please rename the methodhalt
, move it belowget_n_committed_transactions
(all pub methods should be at the top), and use it in the methodhalt_scheduler
of the transaction committer.
Done.
I think we need an abort guard. Additionally, I think we don't need the variable Suggestion: let abort_guard = AbortGuard;
let result = catch_unwind(AssertUnwindSafe(|| {
worker_executor.run();
}));
if let Err(err) = result {
println!("Worker executor thread panicked. {err:?}: {err}");
worker_executor.scheduler.halt();
panic::resume_unwind(err);
abort_guard.release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r5, 1 of 2 files at r6, 2 of 2 files at r7, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @meship-starkware and @noaov1)
7b74bac
to
4d21f6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r8, 1 of 1 files at r9, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @meship-starkware and @noaov1)
crates/blockifier/src/concurrency/utils.rs
line 12 at r9 (raw file):
impl Drop for AbortGuard { fn drop(&mut self) { eprintln!("detected unexpected panic; aborting");
Suggestion:
Thread panicked unexpectedly; aborting");
crates/blockifier/src/concurrency/worker_logic.rs
line 120 at r3 (raw file):
Previously, meship-starkware (Meshi Peled) wrote…
Please ignore. It is here to check the Python test. I will remove this when I am done.
Please temporarily add
// FIXME: Remove the semicolon before merging
So we won't forget this change. (The semicolon by itself is easy to miss.)
Sorry. Please make it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @meship-starkware and @noaov1)
crates/blockifier/src/blockifier/transaction_executor.rs
line 244 at r9 (raw file):
s.spawn(move || { // to make sure that if one of the threads panicked all threads would stop, // and the main thread would panic.
Suggestion:
// If a panic is not handled or the handling logic itself panics,
// then we abort the program.
crates/blockifier/src/blockifier/transaction_executor.rs
line 249 at r9 (raw file):
worker_executor.run(); })); if let Err(err) = result {
Suggestion:
if let Err(err) = catch_unwind(AssertUnwindSafe(|| {
worker_executor.run();
})) {
crates/blockifier/src/blockifier/transaction_executor.rs
line 251 at r9 (raw file):
if let Err(err) = result { // this make sure that the program will abort if a panic accured // while halting the scheduler.
I think this comment is out of place.
Suggestion:
if let Err(err) = result {
crates/blockifier/src/concurrency/utils.rs
line 7 at r9 (raw file):
// This stract is used to abort the program if a // panic ocurred in a place where it cannot be handled.
Suggestion:
// This struct is used to abort the program if a
// panic occurred in a place where it could not be handled.
edfafec
to
81c1892
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 5 files reviewed, 5 unresolved discussions (waiting on @avi-starkware and @noaov1)
crates/blockifier/src/blockifier/transaction_executor.rs
line 247 at r7 (raw file):
Previously, avi-starkware wrote…
I think we need an abort guard.
In case the methodhalt
panics, we won't reach theresume_unwind
, so the process needs to be terminated.
To be sure, try to put apanic!()
inside the methodhalt
and check if the process ends.Additionally, I think we don't need the variable
result
. Just doif let Err(err) = catch_unwind(...
. What do you think?
Done.
crates/blockifier/src/blockifier/transaction_executor.rs
line 249 at r9 (raw file):
worker_executor.run(); })); if let Err(err) = result {
Done.
crates/blockifier/src/blockifier/transaction_executor.rs
line 251 at r9 (raw file):
Previously, avi-starkware wrote…
I think this comment is out of place.
Is this Ok?
crates/blockifier/src/concurrency/utils.rs
line 7 at r9 (raw file):
// This stract is used to abort the program if a // panic ocurred in a place where it cannot be handled.
Done.
81c1892
to
7c36bf8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 2 of 5 files reviewed, 5 unresolved discussions (waiting on @avi-starkware and @noaov1)
crates/blockifier/src/blockifier/transaction_executor.rs
line 244 at r9 (raw file):
s.spawn(move || { // to make sure that if one of the threads panicked all threads would stop, // and the main thread would panic.
Done.
Suggestion: // In this case, no panic message will be logged. Add the cargo flag
// --nocapture to log the panic message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r10, 1 of 2 files at r11, 1 of 1 files at r12, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @meship-starkware and @noaov1)
7c36bf8
to
41492f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 5 files reviewed, 1 unresolved discussion (waiting on @avi-starkware and @noaov1)
crates/blockifier/src/blockifier/transaction_executor.rs
line 253 at r12 (raw file):
// If the program panics here, the abort guard will exit the program. // In this case, no panic message will be logged. Adding the cargo flag // --nocapture will log the panic messages.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r13, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @noaov1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @noaov1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r5, 1 of 2 files at r7, 1 of 3 files at r10, 1 of 1 files at r13, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @meship-starkware and @noaov1)
crates/blockifier/src/concurrency/utils.rs
line 13 at r13 (raw file):
fn drop(&mut self) { eprintln!("detected unexpected panic; aborting"); ::std::process::abort();
?
Suggestion:
std::process::abort();
124aa69
41492f7
to
124aa69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r14, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @noaov1)
This change is