-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leader panic on restart (LastIdNotFound) #1171
Comments
cc #1164 |
I have a "short" (87k entries) ledger that fails to verify with the error LastIdNotFound. I've used ledger-tool to dig around in it. The ledger format is uncorrupted: the index and the data file agree, but the bank is unhappy.
The last_id that's "not found" (let's call it The entry that fails to verify is not the last entry with ``3WPtc``` listed as last_id, there is just one more. I thought there might be some significance to the "off by 2" this represents, but in another (larger) ledger with the same issue, the number of entries that would fail to verify is much larger (586 entries). Immediately following the last entry to use ```3WPtc``, there are 2,872 empty entries. Similarly in the larger log, after the last use of the "bad" last_id, there are also lots of empty entries. #1217 to track.
Tidbit: There is a block of 14,006 entries (comprising 168,690 transactions) that use |
When a full node is running, register_last_entry() is called from the record stage, but there may be transactions in flight between the banking stage and record stage that have been verified against a last_id that is about to be pushed out of last_ids by the record stage. When a bank is being initialized from a ledger, register_last_id() is called synchronously. |
rewrite entry_next_hash in terms of Poh simplify and unify transaction hashing (no embedded nulls) register_last_entry from banking stage, fixes solana-labs#1171
rewrite entry_next_hash in terms of Poh simplify and unify transaction hashing (no embedded nulls) register_last_entry from banking stage, fixes solana-labs#1171
still an issue at ca96237 STR, using multinode-demo: setup |
nevermind, red herring. issue was with write_stage(), which was reversing entry vectors before writing them |
…olana-labs#1171) Bumps [eslint](https://github.com/eslint/eslint) from 7.18.0 to 7.19.0. - [Release notes](https://github.com/eslint/eslint/releases) - [Changelog](https://github.com/eslint/eslint/blob/master/CHANGELOG.md) - [Commits](eslint/eslint@v7.18.0...v7.19.0) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
https://metrics.solana.com:3000/d/testnet/testnet-hud?orgId=2&var-testnet=testnet-master&from=1536591929833&to=1536599129833
Indications are that the leader OOM'd, then panicked trying to read the ledger that was left after the crash.
Note the OOM event at 9:26 and subsequent panics every ~10min thereafter while reading the ledger.
The text was updated successfully, but these errors were encountered: