-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add design for the leader validator loop #2650
Add design for the leader validator loop #2650
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me if this chapter is attempting to describe the existing behavior or the desired behavior. If the latter, how does it compare to the former?
This is a proposal for the desired behavior. I am not sure what the current implementation does; it's hard to grok. |
@garious or my understanding of the current code is that, there is no main control loop, and the leader and validators run concurrently, but try to not operate on the same slot concurrently. |
f260ca7
to
f020c02
Compare
@rob-solana, I don’t have a strong opinion on TVU+TPU concurrency. The loop can just start the tpu asynchronously. But doing so means some weird interactions with voting and PoH reset in the TVU |
@rob-solana, @aeyakovenko, I'm not seeing value in the distinction between PoH Generator and PoH Recorder. Seems like we can use only the PoH Recorder. There doesn't need to be this concept of waiting for a generator to finish. Instead, the TPU can use its Recorder as the timer. Once it's at a certain height, it can start using the same PoH to record entries. |
ok, means we're also creating a bank_fork for the TPU at the time that the recorder is constructed? |
We reset the recorder after voting, so yes, makes sense to me that we'd also fork at that point too. We might want to fork off that parent if, for example, we reject our own TPU's fork later down the line. |
@garious then the TPU and TVU can’t run concurrently. At least the voting part of the TVU can’t reset the tpu’s PoH, or that vote cancels the block. My concern is that a faster asic could get this node to cancel its own block. |
Seems reasonable that a TVU would reject the TPUs block if a faster ASIC got its block to the TVU (and validated!) before the TPU finished its block. Feels a little awkward, but not wrong. |
@garious, it would be great to somehow highlight the TVU vs TPU option as something we need more simulation with. We can’t enforce the behavior at the protocol layer, and I have no idea which is better for the network, or for the individual node. |
@garious, @rob-solana. You would need to kill the TPU fork if the TVU votes for a different fork, or use another thread to generate PoH for the TPU. There might also be races with an older fork completing while the TPU is still building its own fork. |
This is quite a bit different than the original design and could use a whole new review. @mvines, @sagar-solana, @carllin, looking in your general direction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the tvu run concurrently with the tpu? What happens to the PoH recorder that the tpu is using if the tvu votes while the tpu is running?
[saw the update, reset is blocked until tpu is done]
...where "live active chain" was changed to "active fork".
e1df06b
to
a8456d9
Compare
@aeyakovenko, can you review? Since I've been pushing commits to your fork, GitHub won't let me add you as a reviewer. |
@garious lgtm |
@aeyakovenko, no need to mention me or use the LGTM acronym. Just marking the PR as approved is sufficient. |
@garious, I can’t approve my own pr |
* bank: add current_epoch_staked_nodes() Add current_epoch_staked_nodes() which returns the staked nodes for the current epoch. Remove Bank::staked_nodes() which used to return the bank's view of staked nodes updated to the last vote processed. The updated call sites don't really need super up to date stake info, and with this change we can stop updating staked node info for every vote on every bank. Instead we now compute it once per epoch. * bank: current_epoch_stakes: explain why self.epoch + 1
Problem
Lack of design for a clear leader/validator fullnode loop.
Summary of Changes
This is a proposal for a different PoH interface to leaders and validators that allow the fullnode to switch modes as the PoH reaches the scheduled slot.
Fixes #