Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pubkey cache file #1787

Closed
michaelsproul opened this issue Oct 19, 2020 · 6 comments
Closed

Remove pubkey cache file #1787

michaelsproul opened this issue Oct 19, 2020 · 6 comments
Assignees

Comments

@michaelsproul
Copy link
Member

Description

The pubkey cache file is unnecessary when we have a perfectly good database available. I think it would save complexity and reduce the likelihood of error if it were removed and rolled into the database. Its separate handling has been the source of bugs, e.g. #1680

This also links into the locking timeout issue #1096.

@michaelsproul
Copy link
Member Author

From Discord:

CRIT Failed to start beacon node             reason: Unable to open persisted pubkey cache: ValidatorPubkeyCacheFileError("InconsistentIndex { expected: Some(79843), found: 0 }")

Reportedly occurred after OOM.

@paulhauner paulhauner added A1 and removed A1 labels Nov 9, 2020
@paulhauner
Copy link
Member

The pubkey cache file is unnecessary when we have a perfectly good database available.

I'm not sure it's trivial to rebuild the cache from any database. We don't just need all the keys in the head state, we need to go and find the state with the largest state.validators.len() and add all the validators from there, since we assume that the pubkey cache includes every single validator we've ever seen.

@paulhauner paulhauner added the A1 label Nov 9, 2020
@michaelsproul
Copy link
Member Author

I just mean that we store a table in the DB that is exactly the same as the current pubkey cache, so we get DB writes rather than hand-rolled ones.

A user on Discord reported corruption after a power outage:

Hi I followed  Esat's guide.  I had all 3 services connected and synced.  But,  after simulated power loss on my server the Lighthouse Beacon service won't start: 

Unable to open persisted pubkey cache: ValidatorPubkeyCacheFileError("Ssz(InvalidByteLength { len: 32, expected: 56 })")

@apalade
Copy link

apalade commented Dec 20, 2020

I am experiencing the same problem after a power outage.

Dec 20 15:24:26 pala lighthouse[50671]: Dec 20 15:24:26.109 CRIT Failed to start beacon node reason: Unable to open persisted pubkey cache: ValidatorPubkeyCacheFileError("InconsistentIndex { expected: Some(48577), found: 0 }")

Questions:

  1. the only way out of this right now is to resync the database?
  2. would it be safe to use a pubkey_cache.ssz file from an older backup instead of a full resync?

@michaelsproul
Copy link
Member Author

I've sent you the file on Discord, but just in case you check GH first:

  1. the only way out of this right now is to resync the database?

Yeah :\

  1. would it be safe to use a pubkey_cache.ssz file from an older backup instead of a full resync?

An earlier backup might work if you get lucky, but a backup from another node is more reliable.

@michaelsproul
Copy link
Member Author

Another instance on Discord today after power failure:

CRIT Failed to start beacon node             reason: Unable to open persisted pubkey cache: ValidatorPubkeyCacheFileError("InconsistentIndex { expected: Some(100388), found: 0 }")

@michaelsproul michaelsproul self-assigned this Feb 26, 2021
bors bot pushed a commit that referenced this issue Mar 4, 2021
## Issue Addressed

Closes #1787

## Proposed Changes

* Abstract the `ValidatorPubkeyCache` over a "backing" which is either a file (legacy), or the database.
* Implement a migration from schema v2 to schema v3, whereby the contents of the cache file are copied to the DB, and then the file is deleted. The next release to include this change must be a minor version bump, and we will need to warn users of the inability to downgrade (this is our first DB schema change since mainnet genesis).
* Move the schema migration code from the `store` crate into the `beacon_chain` crate so that it can access the datadir and the `ValidatorPubkeyCache`, etc. It gets injected back into the `store` via a closure (similar to what we do in fork choice).
michaelsproul added a commit that referenced this issue Mar 10, 2021
## Issue Addressed

Closes #1787

## Proposed Changes

* Abstract the `ValidatorPubkeyCache` over a "backing" which is either a file (legacy), or the database.
* Implement a migration from schema v2 to schema v3, whereby the contents of the cache file are copied to the DB, and then the file is deleted. The next release to include this change must be a minor version bump, and we will need to warn users of the inability to downgrade (this is our first DB schema change since mainnet genesis).
* Move the schema migration code from the `store` crate into the `beacon_chain` crate so that it can access the datadir and the `ValidatorPubkeyCache`, etc. It gets injected back into the `store` via a closure (similar to what we do in fork choice).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants