Proposal to Handle Duplicate Blocks By Storing Multiple Versions of a Slot #7622

carllin · 2019-12-24T02:19:47Z

Problem

No design for slashing duplicate leaders and storing/handling multiple versions of a slot

Summary of Changes

Add such a design

Fixes #

garious · 2019-12-24T02:49:00Z

I'm so lost by all the slashing half-proposals. There's slashing.md already in the book, some open proposal PRs, and a draft PR with code implementing some form of slashing. Can you please work with @aeyakovenko to close obsolete PRs and if it makes sense, replace the existing book content in this PR?

carllin · 2019-12-24T02:54:01Z

book/src/proposals/leader-duplicate-block-slashing.md

+include the blockhash in the repair response. This means we need to include 
+some repair "cookie" in the request + response that maps some number of bits
+to a particular `blockhash`. This breaks down if there are more versions of 
+this slot than can be tracked by the number of bits allocated to the cookie.


Worried about this, also we need to shrink the shred size by the size of this cookie @pgarg66

it'd be worthwhile to say why "repair responses need to be tied to a particular blockhash", anti-spam?

what if repair responses were xor'd with the request cookie or the blockhash?

Partially anti-spam (although people can still send bogus shreds/bogus cookies, hence @aeyakovenko suggested the merkle tree to figure out if it was the leader or the repair sender who sent the bogus shred), but also to figure out which off those shreds chain to which previous shreds when there are multiple versions of the same slot.

put the rationale in the document, please?

carllin · 2019-12-24T02:58:49Z

book/src/proposals/leader-duplicate-block-slashing.md

+   (another column family?). This is important in order to respond to repairs
+   which will now specify a `blockhash` in addtion to a `slot` and `index`. 
+   Thus if the node crashes before this mapping can be stored, then on restart
+   the validator needs to recompute the blockhash in `blocktree_processor`.


One downside of this approach is that if we have two versions of a slot, each with 100 shreds, the first 99 of which are the same, and the last different, we have to store 200 copies.

carllin · 2019-12-24T03:00:12Z

book/src/proposals/leader-duplicate-block-slashing.md

+2) These shreds for slot `B` are stored under the version `Hash::default()`
+(optimistically assume this child is the only version). If another conflicting
+version of `B` is detected before this version is completed, we drop all the
+shreds for slot `B`.


Need to reason through correctness here. If we have multiple versions of a slot C chaining to multiple version of a slot B, chaining to multiple versions of a slot A, will the procedure we have outlined thus far handle it?

should do, because we can only chain forward one entry at a time

The tricky part is whether the below is correct:

Dropping any unfinished slots on detecting another version

Waiting on repair to perform chaining on slots with multiple versions based on blockhash

I think it should work as well :D

carllin · 2019-12-24T05:49:01Z

@garious, yup for sure! For clarity this proposal replaces this one in particular: #6362. It does not cover lockout-triggered slashing.

Sorry for the confusion!

book/src/proposals/leader-duplicate-block-slashing.md

rob-solana · 2019-12-29T22:44:51Z

book/src/proposals/leader-duplicate-block-slashing.md

+
+1) Wait for all the shreds for the first entry `E_B` of slot `B` to arrive 
+(Implementation can make sure first shred `S_B` always contains only a tick 
+to avoid waiting for multiple shreds).


this option would require that the leader delay ingest for 25% of the slot...

we can add a "first shred in slot" and have that shred carry the parent blockhash and the first tick hash value

another option: give up 4-8 bytes of shred space to carry that many bits of parent blockhash. this would solve the repair cookie issues, too.

@rob-solana, that first option works, marking and having the first shred carry the parent blockhash. Why does it need the first tick hash though?

I was proposing alternatives. I think the 4-8 bytes of parent blockhash would be sufficient, ignore others.

@sagar-solana, @rob-solana, @aeyakovenko, @ryoqun The next area of concern is surrounding the status cache and how it's packaged into snapshots.

This sounds like a real mess.

The root in blocktree, and thus the root included in the snapshot will now have to carry which version of that slot was rooted (specified by blockhash)

Need to update the status cache to handle multiple versions of the same slot, meaning status cache will have to support a key by (slot, blockhash)

Before we detect a conflicting version of a slot/when the leader is building the block in banking stage, the identifier for a slot in the status cache will be Hash::default(). (Recall the blockhash is known for any versions of the slot after a conflict is detected because we repair those versoins of the slot by blockhash)

Only during catchup or repair will the blockhash be known a-priori, so in the normal case the status cache will have to re-index at the end of each block, too?

Status cache is organized by fork (i.e. is chained in memory via bank construction), right? If it were, would there be a need to key by blockhash?

yeah its a huge mess 😛

Status cache is a global store shared across all the banks. At a high level, it maps from siganature -> Vec<(Slot, Status)>. When we look up a signature, we pass a vector of ancestors and check if any of those ancestors contain that signature.

Because now we can have many different versions of a single ancestor, we will have to index by blockhash.

ok, now I remember...

instead of using slots as ancestors in status cache, what if banks assigned themselves a random number and we used that as ancestry in statuscache?

signature->Hashmap<BankID, Status> ?

carllin · 2019-12-29T23:13:16Z

book/src/proposals/leader-duplicate-block-slashing.md

+  is set to `Hash::default()` until the blockhash is computed at the end of 
+  the slot. Then the leader will have to store a mapping from 
+  `Hash::default()` to the actual blockhash in a separate area of storage 
+  (another column family?). This is important in order to respond to repairs


@rob-solana this part kind of sucks, any better ideas that don't involve re-indexing at the end of the slot once the slot has been finished?

blocktree re-implementation where every slot is a directory? when we know the blockhash we do a rename() ?

would a double-lookup from slot+blockhash to slot+randomnumber break ledger portability?

carllin · 2019-12-30T20:24:43Z

@sagar-solana, @rob-solana, @aeyakovenko, @ryoqun. Some remaining design questions:

As suggested by @aeyakovenko, in an attempt to have the majority of the network drop slots that have multiple versions, after we detect a conflicting version of a slot, we drop the slot if it's not yet been voted on. I think this will be done by:

The blocktree insertion thread (maybe some separate conflict detection thread) marks the slot as "conflicting"
Thread from i) sends a signal to ReplayStage over a channel to say this slot is "conflicting"
Window insertion thread refuses to accept any shreds for this slot unless that shred is for a specific
cookie for a specific blockhash (aka we sent a repair request for this version of this slot).
ReplayStage drops this slot from the progress list if it hasn't yet voted for this slot.
Because of 3), ReplayStage will then not replay this slot unless repair requests are made for this slot.
Repair requests are only made if another child slot chained back to this slot

carllin · 2019-12-30T20:28:58Z

@sagar-solana, @rob-solana, @aeyakovenko, @ryoqun The next area of concern is surrounding the status cache and how it's packaged into snapshots.

The root in blocktree, and thus the root included in the snapshot will now have to carry which version of that slot was rooted (specified by blockhash)
Need to update the status cache to handle multiple versions of the same slot, meaning status cache will have to support a key by (slot, blockhash)
Before we detect a conflicting version of a slot/when the leader is building the block in banking stage, the identifier for a slot in the status cache will be Hash::default(). (Recall the blockhash is known for any versions of the slot after a conflict is detected because we repair those versoins of the slot by blockhash)

A concern here is that Hash::default() may then mean different things to different validators when the status cache is packaged into a snapshot, but I think this should be fine because the snapshot root is identified by a blockhash, so any future slots a validator plays after booting from the snapshot must chain from that root, and that root has a fixed set of ancestors (version of each slot).

Do we need to clear the status cache of the Hash::default() version of this slot if we are dropping it? I think it may be ok to keep it around.
On replay when multiple versions of a slot are being repaired then replayed, we need to expose the blockhash
of each version of that slot to the replay logic so that the status cache can be updated properly. Those blockhashes
are only known by blocktree so they'll have to be propagated.

rob-solana · 2019-12-30T23:41:03Z

@sagar-solana, @rob-solana, @aeyakovenko, @ryoqun. Some remaining design questions:

As suggested by @aeyakovenko, jn an attempt to have the majority of the network drop slots that have multiple versions, after we detect a conflicting version of a slot, we drop the slot if it's not yet been voted on.

Is this feasible in general? Can replay stage reliably determine whether the rest of the network has accepted some version of a slot?

carllin · 2019-12-31T01:18:16Z

@rob-solana definitely not guaranteed. It's a best effort mechanism, the hope is majority detects the duplicates and drops all versions of that block so that another fork is selected. But there's definitely a race between detecting the duplicates and voting on the block. The bad case is if the heaviest fork includes some version of the duplicate slot, in which case we will have to rely on the repair mechanism in the proposal to allow validators on other forks to catch up.

rob-solana · 2019-12-31T01:28:51Z

@rob-solana definitely not guaranteed. It's a best effort mechanism, the hope is majority detects the duplicates and drops all versions of that block so that another fork is selected. But there's definitely a race between detecting the duplicates and voting on the block. The bad case is if the heaviest fork includes some version of the duplicate slot, in which case we will have to rely on the repair mechanism in the proposal to allow validators on other forks to catch up.

Ok, so distinguishing between Repair and Turbine is critical. Do we have that capability today?

carllin · 2019-12-31T18:32:21Z

@rob-solana they come in on different ports, which can be pretty easily gamed. But after a validator drops the slot S, they only repair if they see a child slot chain back to this slot S. When a child slot is detected that chains back to this slot:

Validator waits for the first shred in this slot which contains the parent's blockhash
Makes an entry in blocktree for the parent blockhash
Makes repairs for the parent blockhash with some "cookie"
Window service will accept shreds containing the "cookie" mapping to this blockhash.

Everything here from the cookie to the contents of the shred can be pretty easily manipulated here, we might need to make repair more secure (i.e. make sure we only get responses from people we requested things from)

book/src/SUMMARY.md

book/src/proposals/leader-duplicate-block-slashing.md

garious · 2020-01-10T18:02:24Z

@carllin, a few surface-level issues with this PR:

Too many reviewers generally equates to no reviewers
Not obvious if this should be closed do to Duplicate Block Proposal Without Storing Multiple Versions of Slots #7652
Title "Duplicate block" probably doesn't mean what you've intended when read as described by https://github.com/solana-labs/solana/blob/master/CONTRIBUTING.md#the-pr-title. This PR does not duplicate blocks.

carllin · 2020-01-10T19:59:32Z

@garious thanks, will update!

Both this and #7652 are suggestions for ongoing design processes for consensus v2, so we might have to tolerate both of them being open for a while longer until the research/design phase for this is finalized, sorry!

stale · 2020-01-17T23:48:05Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2020-01-25T00:44:59Z

This stale pull request has been automatically closed. Thank you for your contributions.

carllin requested review from garious, sakridge, aeyakovenko, mvines, pgarg66, rob-solana and sagar-solana December 24, 2019 02:19

carllin force-pushed the DuplicateBlock branch 5 times, most recently from f01cd0e to 4f9ec68 Compare December 24, 2019 02:46

carllin commented Dec 24, 2019

View reviewed changes

carllin force-pushed the DuplicateBlock branch 2 times, most recently from c9e1ed9 to d6fa659 Compare December 24, 2019 02:57

carllin commented Dec 24, 2019

View reviewed changes

rob-solana reviewed Dec 25, 2019

View reviewed changes

book/src/proposals/leader-duplicate-block-slashing.md Outdated Show resolved Hide resolved

rob-solana reviewed Dec 25, 2019

View reviewed changes

book/src/proposals/leader-duplicate-block-slashing.md Outdated Show resolved Hide resolved

rob-solana reviewed Dec 25, 2019

View reviewed changes

book/src/proposals/leader-duplicate-block-slashing.md Outdated Show resolved Hide resolved

rob-solana reviewed Dec 25, 2019

View reviewed changes

book/src/proposals/leader-duplicate-block-slashing.md Outdated Show resolved Hide resolved

rob-solana reviewed Dec 25, 2019

View reviewed changes

book/src/proposals/leader-duplicate-block-slashing.md Outdated Show resolved Hide resolved

rob-solana reviewed Dec 25, 2019

View reviewed changes

book/src/proposals/leader-duplicate-block-slashing.md Outdated Show resolved Hide resolved

carllin force-pushed the DuplicateBlock branch from f34e140 to d0a979f Compare December 29, 2019 21:13

rob-solana reviewed Dec 29, 2019

View reviewed changes

carllin commented Dec 29, 2019

View reviewed changes

rob-solana reviewed Dec 31, 2019

View reviewed changes

book/src/SUMMARY.md Outdated Show resolved Hide resolved

aeyakovenko and others added 8 commits January 2, 2020 14:30

leader slashing

bd57c30

nits

8e67f46

tag

5c88736

nits

f47fb65

Add designs for multiple versions of a slot

55db59f

Add merkle proof fetching

2232c95

Updates to formatting

7b4ee31

Add reasoning for cookie

e8b2732

carllin force-pushed the DuplicateBlock branch from bdbfbd9 to e8b2732 Compare January 2, 2020 19:31

ryoqun reviewed Jan 6, 2020

View reviewed changes

book/src/proposals/leader-duplicate-block-slashing.md Outdated Show resolved Hide resolved

ryoqun reviewed Jan 6, 2020

View reviewed changes

book/src/proposals/leader-duplicate-block-slashing.md Outdated Show resolved Hide resolved

nits

b619673

carllin force-pushed the DuplicateBlock branch from 4266856 to b619673 Compare January 6, 2020 07:53

This was referenced Jan 7, 2020

Duplicate Block Proposal Without Storing Multiple Versions of Slots #7652

Closed

Clean up simple-payment-and-state-verification #7697

Closed

carllin changed the title ~~Duplicate block~~ Proposal to Handle Duplicate Blocks By Storing Multiple Versions of a Slot Jan 10, 2020

carllin removed request for mvines, pgarg66 and sagar-solana January 10, 2020 19:59

stale bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jan 17, 2020

stale bot closed this Jan 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal to Handle Duplicate Blocks By Storing Multiple Versions of a Slot #7622

Proposal to Handle Duplicate Blocks By Storing Multiple Versions of a Slot #7622

carllin commented Dec 24, 2019

garious commented Dec 24, 2019

carllin Dec 24, 2019

rob-solana Dec 29, 2019

carllin Dec 30, 2019

rob-solana Dec 31, 2019

carllin Dec 24, 2019

carllin Dec 24, 2019

rob-solana Dec 29, 2019

carllin Dec 30, 2019

carllin commented Dec 24, 2019 •

edited

Loading

rob-solana Dec 29, 2019 •

edited

Loading

rob-solana Dec 29, 2019 •

edited

Loading

rob-solana Dec 29, 2019

carllin Dec 30, 2019

rob-solana Dec 30, 2019

rob-solana Dec 31, 2019

carllin Dec 31, 2019

rob-solana Dec 31, 2019

carllin Dec 29, 2019

rob-solana Dec 29, 2019

carllin commented Dec 30, 2019 •

edited

Loading

carllin commented Dec 30, 2019 •

edited

Loading

rob-solana commented Dec 30, 2019

carllin commented Dec 31, 2019

rob-solana commented Dec 31, 2019

carllin commented Dec 31, 2019

garious commented Jan 10, 2020

carllin commented Jan 10, 2020 •

edited

Loading

stale bot commented Jan 17, 2020

stale bot commented Jan 25, 2020

Proposal to Handle Duplicate Blocks By Storing Multiple Versions of a Slot #7622

Proposal to Handle Duplicate Blocks By Storing Multiple Versions of a Slot #7622

Conversation

carllin commented Dec 24, 2019

Problem

Summary of Changes

garious commented Dec 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carllin commented Dec 24, 2019 • edited Loading

rob-solana Dec 29, 2019 • edited Loading

Choose a reason for hiding this comment

rob-solana Dec 29, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carllin commented Dec 30, 2019 • edited Loading

carllin commented Dec 30, 2019 • edited Loading

rob-solana commented Dec 30, 2019

carllin commented Dec 31, 2019

rob-solana commented Dec 31, 2019

carllin commented Dec 31, 2019

garious commented Jan 10, 2020

carllin commented Jan 10, 2020 • edited Loading

stale bot commented Jan 17, 2020

stale bot commented Jan 25, 2020

carllin commented Dec 24, 2019 •

edited

Loading

rob-solana Dec 29, 2019 •

edited

Loading

rob-solana Dec 29, 2019 •

edited

Loading

carllin commented Dec 30, 2019 •

edited

Loading

carllin commented Dec 30, 2019 •

edited

Loading

carllin commented Jan 10, 2020 •

edited

Loading