Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random beacon cache changes #1278

Merged
merged 12 commits into from
Jan 7, 2021
Merged

random beacon cache changes #1278

merged 12 commits into from
Jan 7, 2021

Conversation

kevjue
Copy link
Contributor

@kevjue kevjue commented Dec 18, 2020

Description

The changes in this PR will make all validators have they randomness cache set before starting mining. It does this by two ways:

  1. Whenever a block is synced or finalized via the consensus engine, the node will check to see if it authored that block, and will cache the block's parent hash (which is needed to recover the block's randomness in the future).
  2. Before a node starts syncing, it will check to see if its last randomness commitment (saved on the random smart contract) has a cached entry. If not, then it will do a reverse search of the blockchain to find the last randomness's block and then cache the block's parent hash.

Other changes

Refactored some of randomness logic. Specifically

  1. Moved generation of randomness logic from package contract_comm/random to consensus/istanbul.
  2. Moved specification of randomness cache leveldb location from package contract_comm/random to consensus/istanbul.
  3. Moved saving and retrieving to/from randomness cache logic from contract_comm/random to core/rawdb/accessors_chain.go.
  4. Moved the randomness seed logic from miner/worker.go to consensus/istanbul package.

Tested

  1. e2e tests (note that the when a validator's data directory is deleted test case in the e2e sync test will verify that restoring the randomness cache works.)
  2. unit tests

Related issues

  • Fixes #[issue number here]

Backwards compatibility

Yes

Copy link
Contributor

@trianglesphere trianglesphere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this fixes the backwards iteration issue because we bail out in commit new work, but this kinda looks like a validator will not propose a block until it rolls back it's head block and re-syncs or it's swapped out for a replica that was able to do that rollback and save the last randomness.

My understanding is that the validator needs to sync the randomness with the block state, but is there a way to do a backwards iteration and save on startup? Would that break something with atomic state?

miner/worker.go Outdated Show resolved Hide resolved
core/blockchain.go Show resolved Hide resolved
miner/worker.go Show resolved Hide resolved
@trianglesphere
Copy link
Contributor

Separately this breaks all of the e2e tests with the following as the last output (seems to time out):

   Deploying 'Random'
   ------------------
   > transaction hash:    0xc62a41b1569a16224df46bf749d66ccc0794f47c0239797e96f8528f7865a716
- Blocks: 0            Seconds: 0
   > Blocks: 0            Seconds: 0
   > contract address:    0xdDe36eF5b705E05fF72851E93Ca5Cc6f593A8461
   > block number:        300
   > block timestamp:     1608568122
   > account:             0x47e172F6CfB6c7D01C1574fa3E2Be7CC73269D95
   > balance:             3010022.399200419
   > gas used:            2729453
   > gas price:           100 gwei
   > value sent:          0 ETH
   > total cost:          0.2729453 ETH

  Setting initial Random implementation on proxy

- Saving migration to chain.

Error: Transaction was not mined within750 seconds, please make sure your transaction was properly sent. Be aware that it might still be mined!    at PromiEvent (/home/circleci/repos/celo-monorepo/node_modules/truffle/build/webpack:/packages/contract/lib/promievent.js:9:1)
    at TruffleContract.setCompleted (/home/circleci/repos/celo-monorepo/node_modules/truffle/build/webpack:/packages/contract/lib/execute.js:169:1)
    at Migration._deploy (/home/circleci/repos/celo-monorepo/node_modules/truffle/build/webpack:/packages/migrate/migration.js:93:1)
    at process._tickCallback (internal/process/next_tick.js:68:7)

@kevjue
Copy link
Contributor Author

kevjue commented Dec 21, 2020

I think that this fixes the backwards iteration issue because we bail out in commit new work, but this kinda looks like a validator will not propose a block until it rolls back it's head block and re-syncs or it's swapped out for a replica that was able to do that rollback and save the last randomness.

My understanding is that the validator needs to sync the randomness with the block state, but is there a way to do a backwards iteration and save on startup? Would that break something with atomic state?

Totally agree with you points. I've been adding to this PR where the it will do the backwards iteration (if the randomness cache is not already saved) right before starting mining.

@kevjue kevjue marked this pull request as draft December 22, 2020 00:06
@kevjue kevjue marked this pull request as ready for review December 24, 2020 01:44
miner/miner.go Outdated
for {
blockHeader := bc.GetHeaderByHash(blockHashIter)

// We got to the genisis block, so this goroutine didn't find the latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "genesis"

miner/miner.go Outdated
return true
}

// goroutine communication channels and waitgroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this second part of the function looks weird...

it reads like:

firstBlock = race(
  getMostRecentFromCoinbaseInHistory()
  getFirstBlockFronCoinbaeInFuture()
)
generateAndSaveCommitement(firstBlock)

now, because of the race, this doesn't look to be deterministic, hence weird.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you suggest this is changed?

Both getters of the race is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, i see some more of you comments below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to wait for new blocks at this point? During sync start, mining gets stopped with Mining aborted due to sync, but as it reads in the new blocks that it doesn't have, it will check if it should save the randomness. Once it's done syncing or fails to sync, it runs this function prior to starting mining again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same doubt as @trianglesphere . I understood that when this function is called we are "already synced" since that's the reason of the update() function. (but i remember it's a tricky one, since it only triggers once)

And it would seems that even if that's not the case, the code in blockchain would still handle it, since it reacts on new blocks

Copy link
Contributor Author

@kevjue kevjue Jan 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles the case of when the the partition's (made up of the node and it's peers when the sync downloader.DoneEvent is emitted) aggregate head block is behind a block that the node authored (let's call this the "future authored block"). Note that this probably is a very rare scenario.

When this scenario happens, two things can happen.

  1. The node's reverse iteration is completed before the "future authored block" is synced. The node will then start istanbul. Even if this partition's aggregate head block is behind other validators, this is already handled via the istanbul engine.
  2. The "future authored block" is synced before the reverse iteration is completed. In this case, the syncing of that block will stop the reverse iteration since it has become irrelevant, and then istanbul will be started.

So waiting for new blocks is basically to handle case 2. Having said all that, that case is probably going to happen very infrequently, and the added benefit (not having to wait for the reverse iteration to complete) seems marginal. So seems like it's not worth the added complexity. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So thinking about what we store, it's a lookup of commitment -> block hash where the block hash is mangled to create the commitment. I think that we have multiple of these commitments in the eth db. When looking up the last commitment made, we check a smart contract, so I think that order does not matter here, only that we do have the last commitment saved in the level db.

It's true that we want to bail out of reverse iteration as an optimization if we find a "future authored block", but shouldn't the future block record its commitment when we process it.

Copy link
Contributor Author

@kevjue kevjue Jan 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trianglesphere - Ya, you're correct. It's unnecessarily recalculating the cache entry when it gets the "future authored block".

WDYT of the proposal of removing that handling of new block logic altogether? It does add alot of complex logic (which means higher chance for bugs) for minor benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me. Seem like it's rare enough and doesn't make it incorrect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

miner/miner.go Outdated
blockHashIter := currentHeader.Hash()

wg.Add(1)
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would extract this two function into their own functions: getMostRecentBlockFromAuthor(errgroup) and waitForNextBlockFromAuthor(errgroup)

and use an errgroup to handle coordination (https://godoc.org/golang.org/x/sync/errgroup)

Also, try move evertyhing to the random.go file; so to confine all code related to randomness to a single place. We can even not make it a part of miner object, and just pass everything as parameter to a static function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually was considering putting some of this logic within istanbul, but decided against it. The reason is that I believe Istanbul should only be responsible for defining the randomness beacon protocol (e.g. when/how is the randomness generated and verified).

As for the randomness cache, the only users of it is the miner package, as the cache is not an essential part of the randomness protocol, but more of an optimization that the miner can use. So with that in mind, restoration of the cache seems to fit within the miner package. I'm happy to create a new random.go file within miner package, though.

miner/miner.go Outdated

// commitmentCacheSaved will check to see if the last commitment cache entry is saved.
// If not, it will search for it and save it.
func (miner *Miner) commitmentCacheSaved(istEngine consensus.Istanbul) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function name suggest is just a query, no side effect or write. I would change the name to make that more apparent.

@@ -1376,6 +1392,11 @@ func (bc *BlockChain) writeBlockWithState(block *types.Block, receipts []*types.
rawdb.WriteBlock(blockBatch, block)
rawdb.WriteReceipts(blockBatch, block.Hash(), block.NumberU64(), receipts)
rawdb.WritePreimages(blockBatch, state.Preimages())
if (randomCommitment != common.Hash{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm hesitatnt to have this logic inside blockchain, would it make sense to have a blockchain listener that computes/writes the commitment for every block?

In fact, we could have a more cohesive type with all random related functions:

  • Listen to new Blocks to generate/write commitment
  • Reconstruct commitment cache
  • Get current / genearte current

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your hesitancy of putting this in blockchain?

The changes are following much of the same design that blockchain already implements. E.g. the totalDifficulty is also a "cache", in that it's possible to reconstruct from the past chain, but it's computed at the time of block insert and saved in it's own leveldb location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for keeping this in blockchain. I think keeping this code block here is much easier to understand than having a dedicated listener.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hesitation is about coupling... or using other maxims: Separation of Concerns / Single Responsibility Principle

Now blockchain "knows" about randomness cache, (it's coupled to it). Same with miner.

And then, the "responsibility" of keeping the cache sane is split in a few places. There's the logic of the cache (read/write), there's the logic of check & recovery on miner, and there's the logic of generation with each new block on blockchain.

Now, in any future where a maintainer makes a change to those modules, it need to have the same logical context and awareness about this. And that's tricky, since it's just one of the many responsibilities these modules have.

That's why my suggestion is to have "something" that represents the randomness cache, and it's responsible of keeping it sane, recovering it and providing access to it. That "something" can even be an interface, thus it's easy to mock and test things in isolation.

@trianglesphere @kevjue what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of keeping the randomness cache within the blockchain object (basically it would be a "primitive" of the block) and move the restoration logic from miner into blockchain (However, miner will still invoke the restoration logic if needed).

As for the actual insertion of the cache (at least for the normal case, as opposed to the restoration case), I still think that it should be done within the block insertion function (basically how it's implemented now), since it's inserted as part of a leveldb transaction along with all of the other parts of the block, since inserting all block related data atomically is one high level goal that we wanted to attain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with bundling lines approx 1360 to 1370 into some manager object/primitive, but I agree with Kevin that rawdb.WriteRandomCommitmentCache(...) should stay here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

miner/miner.go Outdated
return true
}

// goroutine communication channels and waitgroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think here i would have a new function, something like "regenerateCache"... and a good log message stating taht we are doing such... (and that it might take a while)

@trianglesphere
Copy link
Contributor

I deployed 64494cc9ce6a1717de45b1a61d618d1f17de394c to baklava, and the fix got the validator to start signing again.

@@ -2336,3 +2337,45 @@ func (bc *BlockChain) SubscribeLogsEvent(ch chan<- []*types.Log) event.Subscript
func (bc *BlockChain) SubscribeBlockProcessingEvent(ch chan<- bool) event.Subscription {
return bc.scope.Track(bc.blockProcFeed.Subscribe(ch))
}

func (bc *BlockChain) RecoverRandomnessCache(commitment common.Hash, commitmentBlockHash common.Hash) error {
if istEngine, isIstanbul := bc.engine.(consensus.Istanbul); isIstanbul {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very personal nit :P

I prefer exiting fast, and avoiding nesting here:

istEngine, isIstanbul := bc.engine.(consensus.Istanbul)
if !isIstanbul {
   return 
}
...

for {
blockHeader := bc.GetHeaderByHash(blockHashIter)

// We got to the genisis block, so this goroutine didn't find the latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: genesis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, not a goroutine anymore

@@ -2336,3 +2337,45 @@ func (bc *BlockChain) SubscribeLogsEvent(ch chan<- []*types.Log) event.Subscript
func (bc *BlockChain) SubscribeBlockProcessingEvent(ch chan<- bool) event.Subscription {
return bc.scope.Track(bc.blockProcFeed.Subscribe(ch))
}

func (bc *BlockChain) RecoverRandomnessCache(commitment common.Hash, commitmentBlockHash common.Hash) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go doc

func (bc *BlockChain) RecoverRandomnessCache(commitment common.Hash, commitmentBlockHash common.Hash) error {
if istEngine, isIstanbul := bc.engine.(consensus.Istanbul); isIstanbul {

blockHashIter := commitmentBlockHash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would add a log message here, it's quite a rare thing, and can take a decent amount of time, so an INFO msg would be nice, to inform the user what's happening

@kevjue kevjue merged commit 8ec69a8 into master Jan 7, 2021
@kevjue kevjue deleted the kevjue/random_beacon_fix branch January 7, 2021 19:46
trianglesphere pushed a commit that referenced this pull request Jan 8, 2021
* update the randomness commitment cache when inserting blocks into the blockchain

* added comments

* reverse iteration search for the commitment cache entry

* unsubscribe from the newblock subscription

* fixed bug and added comments

* handled randomness beacon case when smart contracts are being migrated

* fixed bug

* fixed a bug

* fixed a bug and added comments to non intuitive code

* addressed from PR comments

* addressed PR comments

Co-authored-by: Kevin Jue <[email protected]>
trianglesphere pushed a commit that referenced this pull request Jan 8, 2021
* update the randomness commitment cache when inserting blocks into the blockchain

* added comments

* reverse iteration search for the commitment cache entry

* unsubscribe from the newblock subscription

* fixed bug and added comments

* handled randomness beacon case when smart contracts are being migrated

* fixed bug

* fixed a bug

* fixed a bug and added comments to non intuitive code

* addressed from PR comments

* addressed PR comments

Co-authored-by: Kevin Jue <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants