Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

[DEPRECATED] split into smaller tasks: Need a proposal for rollback and rotation #2162

Closed
20 tasks
aeyakovenko opened this issue Dec 14, 2018 · 18 comments
Closed
20 tasks

Comments

@aeyakovenko
Copy link
Member

aeyakovenko commented Dec 14, 2018

Problem

Need a proposal of changed to the code for rollback + rotation

Proposed Solution

Block, Slots, etc.. are confusing terms for us because a leader transmits entries over a larger tick range than just the leaders slot. This proposes the following definitions for the terms.

  • slot: a range in ticks in which a leader can transmit entries with data. Leader can transmit virtual ticks that are at a lower tick then the start of the slot. A vote can only occur at the end of a slot. A leader could be scheduled for consecutive slots.
  • block: a group of entries that span in tick height for a full slot
  • blob: udp packet containing any number of full entries (current definition).

A leader transmits blocks for their slot. The blocks may overlap previous slots. The last block is the only block that can contain data. For example, the following are all valid transmissions for a single leader:

  • [[block 0/out of 2, slot 50], [block 1/out of 2, slot 50], [block 2 / out of 2, slot 50]]

Block 0/2 and block 1/2 are entries that only contain virtual ticks and span the previous 2 slots in tick height. Block 2/2 contains data.

  • [block 0/0]

Block 0/0 only contains data for this leader, and it's PoH connects directly to the previous leaders slot data transmission. The leader knows exactly how many slots it is skipping, so it can transmit the expected number of blocks.

To keep it simple, a blob would specify the range of ticks the leader is transmitting for as well as the blob number for the slot and how many erasure blobs are expected. Block number can be computed from the tick height and the range. A blob should include all the data necessary to place it into the db ledger, including whatever erasure bits we need. Placing a blob into the ledger should be completely context free. Because the blob headers are signed by the current leader, we don't really need to worry about malicious data. The validator will overrun the leaders slot with its PoH and start on the next slot if the data in the slot fails to produce a valid block of data.

Separate Ledger transmission/receiving from Ledger Processing

#2163 #2277
A validator should be able to receive ledger from the network without processing transactions. Each leader slot only has 1 possible leader that can produce blocks for that slot. Upon receiving a blob the ledger DB can quickly decide if its valid or not (signed by the leader), and where it is placed in the slot. This data structure can aggregate blobs until it sees a full complete votable block and then signal the bank. Optimizing this for optimistic verification can be done later.

  • db ledger is indexed by slot, each slot is an array of blocks made up of blobs. slot 1 -> [block 0, block 1....] The last block is the only one with data. All the blocks leading up to the last block contain Ticks. Because of exponential decrease in likelyhood of rollback we shouldn't expect more than 32 blocks. Leaders can be scheduled for consecutive slots, the pipeline is agnostic to this.
  • blobs contains enough info to derive the slot, block, tick height and erasure bit.
  • network layer just dumps blobs into the right spot in the db ledger, add enough redundant data to blobs until adding a blob to the ledger is completely context free.
  • db ledger sends a signal whenever a last block with data in a slot is ready to replay stage, poh verification happens here
  • validator is concurrently generating a PoH from its own last vote. This height indicates the current network slot and the validator should only repair/retransmit data for that slot.
Flat Checkpoint Store

#2499 #2497
A single table that contains all the checkpoints and is easy to follow. At any checkpoint, the updated accounts in that checkpoint overlay the previous checkpoint.

  • bank is organized as HashMap<LastId, (LastId, Accounts, LastIds)>. The LastId key is the block which this checkpoint is for. This lastid is the id of the block for a specific data transmission by a leader, and not a virtual last id. The tripplet contains the LastId of the previous block that this checkpoint is derived from.
  • LastIds can be a bloom filter initialized with the Previous LastId. Designed to drop 1 in 1b. (Stretch goal)
  • gossip can be used to get approximate network height, and blobs that are within the gossiped height can be placed into the ledger.
Repair

#2442
If the majority of the network fails to observe the data from the leader before the slot expires, it is unlikely to ever be accepted.

  • Exponential backoff for erasure code percentage. Every leader should double the erasure code % for every partially observed slot.
  • nodes that are in the minority are basically just trying to catch up to the height that is advertised in gossip. So they should be repairing blobs between their last checkpoint up to the gossip height.
Vote Data Consistency

Once a vote is signed and transmitted, it cannot be "undone". If the data is lost locally the local validator is at risk of being slashed. The vote stack is the sequence of votes that have been made by the validator.

  • Replay stage decides to vote. The vote is checked into DB for the slot. Upon boot, the validator can examine the bank account state and reconcile it against the DB. Any votes missing from the ledger can be retransmitted. @carllin
  • Vote program maintains state of the vote stack. See fork selection. This means this program is forked in every checkpoint. Even so, the stacks in each checkpoint will remain consistent, because each subsequent fork occupies a different height and would force the same vote to expire. Before voting, the replay stage iterates through the vote stack instances in all the checkpoints and makes sure that the vote doesn't violate any of the vote stacks. Latest one should be the only one that is necessary to check though but we should write some tests to make sure that is true.
Fork Selection

#2289
See fork selection. The replay stage asks db ledger for full complete blocks and processes all the entries at once in parallel. In doing so, it can produce multiple checkpoints, pick the one that maximizes network reward and vote on that one. This operation can keep track of HashMap<Block, BankHash>, as a mapping of blocks to bank hashes that identify a checkpoint in the bank.

  • replay stage iterates through its checkpoints and asks the ledger for any new blocks for that checkpoint and creates a new checkpoint
  • replay stage picks the "best" checkpoint out of the new checkpoints (see fork-selection.md) and votes
  • vote is gossiped, and sent to the next leader
  • voted bank is checkpoint
  • vote triggers restart of PoH recorder. The recorder is reset to generate from the voted LastId
  • expired checkpionts are unrolled, or dequeued (GC of the bank occurs here)
Switch to Leader

PoH recorder continuously runs and signals the validator when it reached the height at which this validator should become a leader. At that point, the PoH stream that has been produced is derived from the last voted block by this validator. That voted block has a checkpoint that was created in Fork Selection ^^^^. That is the checkpoint that should be used by the leader.

  • when this validator is the next leader, it transmits all the generated virtual ticks from the last voted block as its blocks 0,1...
  • leader looks at the crds for any votes that are valid for the above checkpoint that haven't been registered yet.

tag: @rob-solana @carllin @mvines @garious

@carllin
Copy link
Contributor

carllin commented Dec 17, 2018

Question on how we should prioritize repairs. Let's say we have all the blocks for slot 1, something like slot 1: [block 0/0]. We then receive a blob for slot 6, block 0/3. It follows that:

  1. We know based on mapping the tick height to slots, that this means the leader for slot 6 is transmitting virtual ticks for blocks 4 and 5. This also means this leader for slot 6 observed real data for slot 3. Thus do we then prioritize repairs for slot 3? If so, do we start by asking for everything from slot 3, block 0, blob index 0, until we receive the last tick for slot 3, at which point we mark slot 3 as complete?
    If we then receive slot 3 block 0/0, we now know that there must be real data for slot 2 as well. Do we then prioritize repairing slot 2 instead (this essentially forms a chain of slots where we have to keep track of a “head” of this chain that needs to be repaired)?

  2. We then receive slot 7, block 0/4. This means that slot 7 is directly chaining to the slot 2 b/c he’s transmitting virtual ticks for slots 3-6. This is also a separate fork from what we’re observing in 1) b/c they do not chain. How do we pick which of these two forks to prioritize repairing (maybe whichever one can chain back to slot 1 first)? Is this fork just a separate head, one of many chains we keep around in the db_ledger?

@carllin
Copy link
Contributor

carllin commented Dec 17, 2018

It would also be good to flesh out how the guard zones fit into this. Does each validators local poh service generate the rest of the ticks in the guard zone for a block after the last tick before the guard zone is received?

@aeyakovenko aeyakovenko changed the title path to rollback + rotation Need a proposal for rollback and rotation Dec 17, 2018
@rob-solana
Copy link
Contributor

rob-solana commented Dec 17, 2018

A validator should be able to receive ledger from the network without processing transactions. Each leader slot only has 1 possible leader that can produce blocks for that slot. Upon receiving a blob the ledger DB can quickly decide if its valid or not (signed by the leader), and where it is placed in the slot.

Implies that we have a leader schedule that extends to the slot, if in the future. If not, the blob is dropped, stored until such time as we can chain it? Checked against gossip for current network tick height?

@rob-solana
Copy link
Contributor

rob-solana commented Dec 17, 2018

This data structure can aggregate blobs until it sees a full complete votable block and then signal the bank.

Use of the word block here to mean a place where a vote can occur, also where a fork can occur? Need a new word for a place where a vote can be placed.

@aeyakovenko
Copy link
Member Author

aeyakovenko commented Dec 17, 2018

It would also be good to flesh out how the guard zones fit into this. Does each validators local poh service generate the rest of the ticks in the guard zone for a block after the last tick before the guard zone is received?

@carllin I think we need to experiment with repair, but the current slot that the validator is at with its own PoH+/- some range is probably what should be actively repaired for.

see Repair above

@aeyakovenko
Copy link
Member Author

Use of the word block here to mean a place where a vote can occur, also where a fork can occur? Need a new word for a place where a vote can be placed.

@rob-solana a vote can only occur at the end of a slot. A leader could be scheduled for consecutive slots.

@aeyakovenko
Copy link
Member Author

aeyakovenko commented Dec 17, 2018

Implies that we have a leader schedule that extends to the slot, if in the future. If not, the blob is dropped, stored until such time as we can chain it? Checked against gossip for current network tick height?

@rob-solana that works!

@rob-solana
Copy link
Contributor

Use of the word block here to mean a place where a vote can occur, also where a fork can occur? Need a new word for a place where a vote can be placed.

@rob-solana a vote can only occur at the end of a slot. A leader could be scheduled for consecutive slots.

ok: slot is vote-able. will update the terminology to reflect. I think that block is still an ungood word, because we use it for so many other things, even in this proposal.

@aeyakovenko
Copy link
Member Author

ok: slot is vote-able. will update the terminology to reflect. I think that block is still an ungood word, because we use it for so many other things, even in this proposal.

@rob-solana Yea, because a slot overlaps previous slots. It is a recursively defined structure. So the portion of a slot that is just the virtual PoH that overlaps a slot is what I am calling a block in this proposal.

@garious
Copy link
Contributor

garious commented Dec 17, 2018

@aeyakovenko, can you move this issue into a PR against the book? Pretty awful that these guys are having to use the quote mechanism to provide line-level feedback and that there has been 54 edits to the issue text.

@garious
Copy link
Contributor

garious commented Dec 17, 2018

@aeyakovenko, please review https://solana-labs.github.io/solana/terminology.html. If you want to redefine block, I'd appreciate a lengthy justification. The justification for the current definition is here: https://solana-labs.github.io/solana/synchronization.html

@aeyakovenko
Copy link
Member Author

@garious a short justification is that we don't have a term for data transmitted by the leader in its slot that overlaps in PoH previous slots.... So our terminology fails to capture the problem.

@aeyakovenko
Copy link
Member Author

@garious this issue is cross component. Kind of hard to add it to the book. I would love some suggestions on where to place each part.

@garious
Copy link
Contributor

garious commented Dec 18, 2018

@aeyakovenko, I can think through integration just before the PR is merged. Anywhere under proposals is fine for now. The important thing is that this discussion move to a WIP PR.

@garious
Copy link
Contributor

garious commented Jan 17, 2019

@aeyakovenko, are people still using this issue? I can't navigate it at all. Can you move the remaining work to this project: https://github.com/solana-labs/solana/projects/11?

@aeyakovenko
Copy link
Member Author

@garious sounds good. I think a part is covered by entry tree

@aeyakovenko
Copy link
Member Author

@aeyakovenko break this up into a project!

@aeyakovenko
Copy link
Member Author

@aeyakovenko aeyakovenko changed the title Need a proposal for rollback and rotation split into smaller tasks: Need a proposal for rollback and rotation Jan 25, 2019
@aeyakovenko aeyakovenko changed the title split into smaller tasks: Need a proposal for rollback and rotation [DEPRECATED] split into smaller tasks: Need a proposal for rollback and rotation Jan 25, 2019
AshwinSekar pushed a commit to AshwinSekar/solana that referenced this issue Jul 17, 2024
* ci: use correct repository name for the restriction

* ci: don't publish docs temporarily

* ci: remove crowdin

* relace <br> with <br />

* ci: bumping version
ruuda pushed a commit to ChorusOne/solana that referenced this issue Jul 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants