Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Allow truncation of pubkey cache on creation #1686

Closed
wants to merge 1 commit into from

Conversation

michaelsproul
Copy link
Member

Issue Addressed

Closes #1680

Proposed Changes

This PR fixes a race condition in beacon node start-up whereby the pubkey cache could be created by the beacon chain builder before the PersistedBeaconChain was stored to disk. When the node restarted, it would find the persisted chain missing, and attempt to start from scratch, creating a new pubkey cache in the process. This call to ValidatorPubkeyCache::new would fail if the file already existed (which it did). I changed the behaviour so that pubkey cache initialization now doesn't care whether there's a file already in existence (it's only a cache after all). Instead it will truncate and recreate the file in the race scenario described.

@michaelsproul michaelsproul added ready-for-review The code is ready for review low-hanging-fruit Easy to resolve, get it before someone else does! labels Sep 29, 2020
@michaelsproul michaelsproul added work-in-progress PR is a work-in-progress and removed ready-for-review The code is ready for review labels Sep 29, 2020
@michaelsproul
Copy link
Member Author

Marking this WIP while we work out a fix for this case raised by @paulhauner:

  1. User inits client, including cache and persisted head
  2. Cache gets mistakenly deleted
  3. This patch silently recreates the cache without complete information, leading to errors down the line (?)

@michaelsproul
Copy link
Member Author

michaelsproul commented Sep 30, 2020

@paulhauner I think the case mentioned above is actually OK, because the beacon chain won't even try to re-initialise the cache if the head exists and it's missing. I tested deleting the pubkey cache and this error occurs:

Sep 30 14:29:50.954 WARN Ethereum 2.0 is pre-release. This software is experimental.
Sep 30 14:29:50.954 INFO Lighthouse started                      version: Lighthouse/v0.2.12-c31cbb627
Sep 30 14:29:50.955 INFO Configured for testnet                  name: medalla
Sep 30 14:29:50.956 INFO Data directory initialised              datadir: /home/michael/.lighthouse/medalla
Sep 30 14:29:51.048 INFO Starting beacon chain                   method: resume, service: beacon
Sep 30 14:29:51.098 CRIT Failed to start beacon node             reason: Unable to open persisted pubkey cache: ValidatorPubkeyCacheFileError("Io(Os { code: 2, kind: NotFound, message: \"No such file or directory\" })")
Sep 30 14:29:51.098 INFO Internal shutdown received              reason: Failed to start beacon node
Sep 30 14:29:51.098 INFO Shutting down..

My change is only relevant in the case where ValidatorPubkeyCache::new is called, which is here:

let validator_pubkey_cache = self.validator_pubkey_cache.map(Ok).unwrap_or_else(|| {
ValidatorPubkeyCache::new(&canonical_head.beacon_state, pubkey_cache_path)
.map_err(|e| format!("Unable to init validator pubkey cache: {:?}", e))
})?;

And it only happens if self.validator_pubkey_cache is None, which can't happen in the resume_from_db case, because we trip the error on line 316:

let pubkey_cache = ValidatorPubkeyCache::load_from_file(pubkey_cache_path)
.map_err(|e| format!("Unable to open persisted pubkey cache: {:?}", e))?;
self.genesis_block_root = Some(chain.genesis_block_root);
self.head_tracker = Some(
HeadTracker::from_ssz_container(&chain.ssz_head_tracker)
.map_err(|e| format!("Failed to decode head tracker for database: {:?}", e))?,
);
self.validator_pubkey_cache = Some(pubkey_cache);

@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Sep 30, 2020
Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulhauner I think the case mentioned above is actually OK

Nice, sorry for the distraction.

Merge it!

@michaelsproul
Copy link
Member Author

No problemo~

bors r+

bors bot pushed a commit that referenced this pull request Sep 30, 2020
## Issue Addressed

Closes #1680

## Proposed Changes

This PR fixes a race condition in beacon node start-up whereby the pubkey cache could be created by the beacon chain builder before the `PersistedBeaconChain` was stored to disk. When the node restarted, it would find the persisted chain missing, and attempt to start from scratch, creating a new pubkey cache in the process. This call to `ValidatorPubkeyCache::new` would fail if the file already existed (which it did). I changed the behaviour so that pubkey cache initialization now doesn't care whether there's a file already in existence (it's only a cache after all). Instead it will truncate and recreate the file in the race scenario described.
@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Sep 30, 2020
@bors bors bot changed the title Allow truncation of pubkey cache on creation [Merged by Bors] - Allow truncation of pubkey cache on creation Sep 30, 2020
@bors bors bot closed this Sep 30, 2020
@michaelsproul michaelsproul deleted the issue-1680 branch September 30, 2020 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low-hanging-fruit Easy to resolve, get it before someone else does! ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

If beacon is restarted after genesis state has been generated but before genesis event, it will fail.
2 participants