Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design for the leader validator loop #2650

Merged
merged 21 commits into from
Feb 13, 2019

Conversation

aeyakovenko
Copy link
Member

Problem

Lack of design for a clear leader/validator fullnode loop.

Summary of Changes

This is a proposal for a different PoH interface to leaders and validators that allow the fullnode to switch modes as the PoH reaches the scheduled slot.

Fixes #

Copy link
Contributor

@garious garious left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me if this chapter is attempting to describe the existing behavior or the desired behavior. If the latter, how does it compare to the former?

book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
@aeyakovenko
Copy link
Member Author

aeyakovenko commented Feb 4, 2019

It's not clear to me if this chapter is attempting to describe the existing behavior or the desired behavior. If the latter, how does it compare to the former?

This is a proposal for the desired behavior. I am not sure what the current implementation does; it's hard to grok.

@aeyakovenko
Copy link
Member Author

@garious or my understanding of the current code is that, there is no main control loop, and the leader and validators run concurrently, but try to not operate on the same slot concurrently.

@aeyakovenko aeyakovenko changed the title proposal for the leader validator loop Propose a design for the leader validator loop Feb 4, 2019
@garious garious self-assigned this Feb 5, 2019
@garious garious force-pushed the leader_validator_loop branch from f260ca7 to f020c02 Compare February 6, 2019 23:22
@aeyakovenko
Copy link
Member Author

@rob-solana, I don’t have a strong opinion on TVU+TPU concurrency. The loop can just start the tpu asynchronously. But doing so means some weird interactions with voting and PoH reset in the TVU

@garious garious changed the title Propose a design for the leader validator loop Add design for the leader validator loop Feb 7, 2019
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
@garious
Copy link
Contributor

garious commented Feb 7, 2019

@rob-solana, @aeyakovenko, I'm not seeing value in the distinction between PoH Generator and PoH Recorder. Seems like we can use only the PoH Recorder. There doesn't need to be this concept of waiting for a generator to finish. Instead, the TPU can use its Recorder as the timer. Once it's at a certain height, it can start using the same PoH to record entries.

@rob-solana
Copy link
Contributor

@rob-solana, @aeyakovenko, I'm not seeing value in the distinction between PoH Generator and PoH Recorder. Seems like we can use only the PoH Recorder. There doesn't need to be this concept of waiting for a generator to finish. Instead, the TPU can use its Recorder as the timer. Once it's at a certain height, it can start using the same PoH to record entries.

ok, means we're also creating a bank_fork for the TPU at the time that the recorder is constructed?

@garious
Copy link
Contributor

garious commented Feb 7, 2019

We reset the recorder after voting, so yes, makes sense to me that we'd also fork at that point too. We might want to fork off that parent if, for example, we reject our own TPU's fork later down the line.

@aeyakovenko
Copy link
Member Author

@garious then the TPU and TVU can’t run concurrently. At least the voting part of the TVU can’t reset the tpu’s PoH, or that vote cancels the block.

My concern is that a faster asic could get this node to cancel its own block.

@garious
Copy link
Contributor

garious commented Feb 8, 2019

Seems reasonable that a TVU would reject the TPUs block if a faster ASIC got its block to the TVU (and validated!) before the TPU finished its block. Feels a little awkward, but not wrong.

@aeyakovenko
Copy link
Member Author

@garious, it would be great to somehow highlight the TVU vs TPU option as something we need more simulation with. We can’t enforce the behavior at the protocol layer, and I have no idea which is better for the network, or for the individual node.

@aeyakovenko
Copy link
Member Author

aeyakovenko commented Feb 8, 2019

@garious, @rob-solana. You would need to kill the TPU fork if the TVU votes for a different fork, or use another thread to generate PoH for the TPU.

There might also be races with an older fork completing while the TPU is still building its own fork.

@garious
Copy link
Contributor

garious commented Feb 9, 2019

This is quite a bit different than the original design and could use a whole new review. @mvines, @sagar-solana, @carllin, looking in your general direction.

Copy link
Member Author

@aeyakovenko aeyakovenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the tvu run concurrently with the tpu? What happens to the PoH recorder that the tpu is using if the tvu votes while the tpu is running?

[saw the update, reset is blocked until tpu is done]

@garious garious force-pushed the leader_validator_loop branch from e1df06b to a8456d9 Compare February 9, 2019 03:21
@garious
Copy link
Contributor

garious commented Feb 13, 2019

@aeyakovenko, can you review? Since I've been pushing commits to your fork, GitHub won't let me add you as a reviewer.

@garious
Copy link
Contributor

garious commented Feb 13, 2019

@mvines, any review comments? considering you've been knee deep in this in #2735.

@aeyakovenko
Copy link
Member Author

@garious lgtm

@garious
Copy link
Contributor

garious commented Feb 13, 2019

@aeyakovenko, no need to mention me or use the LGTM acronym. Just marking the PR as approved is sufficient.

@aeyakovenko
Copy link
Member Author

@garious, I can’t approve my own pr

book/src/leader-validator-transition.md Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
book/src/leader-validator-transition.md Outdated Show resolved Hide resolved
@garious garious merged commit aec44e3 into solana-labs:master Feb 13, 2019
brooksprumo pushed a commit to brooksprumo/solana that referenced this pull request Aug 28, 2024
* bank: add current_epoch_staked_nodes()

Add current_epoch_staked_nodes() which returns the staked nodes for the
current epoch. Remove Bank::staked_nodes() which used to return the
bank's view of staked nodes updated to the last vote processed.

The updated call sites don't really need super up to date stake info,
and with this change we can stop updating staked node info for every
vote on every bank. Instead we now compute it once per epoch.

* bank: current_epoch_stakes: explain why self.epoch + 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants