-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix tpu tvu bank race #2707
fix tpu tvu bank race #2707
Conversation
Can you add a test for this case? |
Codecov Report
@@ Coverage Diff @@
## master #2707 +/- ##
========================================
+ Coverage 77.7% 78.4% +0.7%
========================================
Files 114 114
Lines 18864 18962 +98
========================================
+ Hits 14663 14873 +210
+ Misses 4201 4089 -112 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't land this without a test. I removed roughly this from my de-bootstraping PR because no test told me to keep it.
@@ -319,6 +320,10 @@ impl Fullnode { | |||
tick_height, | |||
); | |||
|
|||
while self.bank.tick_height() < tick_height { | |||
sleep(Duration::from_millis(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's gotta be a smarter way to do this rather than a semi-random sleep too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but that only further justifies a test. A smarter way can be postponed if we add a good test now.
@carllin so this patch rolls into the work I’m doing with stressing fullnodes with super low ticks per slot. Want me to take from here? |
@mvines, I'm currently working on a test for this case, so I'll just push that once its ready? |
ecdd291
to
67b15f3
Compare
…bs#2707) ff cleanup: remove enable_gossip_duplicate_proof_ingestion, it's active everywhere.
…bs#2707) ff cleanup: remove enable_gossip_duplicate_proof_ingestion, it's active everywhere.
Problem
The TVU bank could potentially be behind the TPU bank at the time of a leader -> validator transition. In the case of a leader -> leader transition, the TPU bank is then copied from the trailing TVU bank, and broadcast is started up with a last id that is not up to date (causing verification failures) and banking_stage threads are started up with an old bank state.
Summary of Changes
Wait for the TVU bank to catch up to current tick height before performing rotation.
Fixes #