Tools that help recover corrupted ledger store file #26813

yhchiang-sol · 2022-07-27T15:59:28Z

Problem

Solana ledger store uses rocksdb as its underlying storage. In some rare cases such as hardware failure,
rocksdb might report data corruption or block checksum mismatch on one of its sst file. Below is one
example error log from #9009:

[2020-03-22T10:22:48.766837593Z ERROR solana_core::window_service] thread 
Some("solana-window-insert") error BlockstoreError(RocksDb(Error { 
message: "Corruption: block checksum mismatch: expected 3583270445, got 3398136873  
in /mnt/vol1/ledger/rocksdb/165855.sst offset 25107936 size 3758" }))

When this happens, the validator will not be able to continue even if all other sst files are still readable and healthy.
Currently, a clean restart might be the only way to recover.

Proposed Solution

A set of tools that provide a way to recover the corrupted file would be a better solution than a clean restart
as it allows the validator to recover without losing its local data.

The key idea of recovering the corrupted sst file is to first obtain the column family information and key range
of the corrupted file if its metadata blocks are still healthy. Then, based on the column family name and the
key range, we can then copy the data within that range from a healthy validator and replace the corrupted file.

Here're possible solutions:

ledger-tool based solution

Obtain the column family and key range information of the corrupted sst file via rocksdb_livefiles() API. Add ledger tool command print-file-metadata #26790 includes a sketch implementation.
Conver the key range to slot range.
On a healthy validator, use the copy command of the ledger-tool to copy the data from the above key range.
Do a full compaction on the above output ledger store. After this, each column family will contain only one sst file.
Replace the corrupted sst file using the sst file in the corresponding column family from the above output ledger store.
TO DISCUSS: is it okay to directly copy data to an empty ledger store?

lower level solution

Obtain the column family and key range information of the corrupted sst file via rocksdb_livefiles just like the previous solution.
On a healthy validator, create a new rocksdb database using the column family options of the above column family.
Copy all the data within the key range in the target column family from the healthy validator to the newly created rocksdb instance.
Do a full compaction of the output rocksdb that results in a single sst file.
Replace the corrupted sst file using the sst file in the corresponding column family from the above output ledger store.
TO DISCUSS: the solution might run faster but the user might need to pick a healthy validator in the same fork in order to keep the data consistent.

Another solution might be introducing a new RPC call to obtain data within the slot range, but I feel RPC calls are designed for solving real-time tasks and are less suitable for offline recovery tools.

The text was updated successfully, but these errors were encountered:

yhchiang-sol · 2022-08-02T08:13:08Z

To simply reduce the downtime while trying to keep the validator in a consistent state, another way is to purge all the data that is no earlier than the slot range of the corrupted file:

Obtain the column family and slot range information of the corrupted sst file via rocksdb_livefiles just like the previous solution.
From the above slot range, purge all data that is no earlier than the ending slot range of the corrupted file. This part can be done by implementing a new ledger tool command that essentially calls delete_files_in_range(), where the range is 0 to the endint slot range of the corrupted file. The implementation would be something similar to Delete files older than the lowest_cleanup_slot in LedgerCleanupService::cleanup_ledger #26651.

steviez · 2022-08-31T20:20:44Z

The key idea of recovering the corrupted sst file is to first obtain the column family information and key range
of the corrupted file if its metadata blocks are still healthy

Supposing the metadata blocks are not healthy, do we have any recourse aside from wiping ledger? I assume manually deleting the SST file would be risky.
If the metadata blocks are still alright, we're able to recover most of the SST file right? That is, we can tell which blocks are corrupted and only need to discard those while keeping the healthy ones? Or, is the entire SST file lost? I haven't looked at SST file format in depth in a little while so might need to refresh myself

I saw rocksdb has some repair functions, and it looks like at least one of them is hooked up in Rust wrapper too:
https://github.com/facebook/rocksdb/blob/e7525a1fffd0def3cc4c804e0c6070f7dae0d06a/include/rocksdb/db.h#L1818-L1840

steviez · 2022-08-31T21:01:21Z

On a healthy validator, use the copy command of the ledger-tool to copy the data from the above key range.

Do a full compaction on the above output ledger store. After this, each column family will contain only one sst file.

Replace the corrupted sst file using the sst file in the corresponding column family from the above output ledger store.

Both approaches you mentioned seem to involve a large amount of manual process. Moreso, it requires an operator to readily have access to a good version of that slot, either through direct access of additional node(s) or through the community. The community in general is really helpful, but again, manual intervention isn't great.

Assuming we can reliably determine the range of a corrupted block, an idea that comes to mind would be to wipe that range from the ledger altogether (ie via purge to erase across all column families).

If the corrupted range is older than the most recent snapshot, we'll continue on anyways
- This doesn't cover RPC / warehouse nodes that want deep history
If the corrupted range is newer than the most recent snapshot, repair requests could fill those slots, same as for validators that were offline and missed turbine blast

steviez · 2022-08-31T21:03:15Z

One more note, our recovery process might vary depending on the CF. For example, if we have the shreds, we should be able to reconstruct the the metadata fields (SlotMeta, Index, etc) on the same machine by re-inserrting. There could also be some column families we don't care about (I think ProgramCosts would be fine to wipe) as well as some that could be more problematic for RPC uses cases that maintain long history (ie TransactionStatus)

yhchiang-sol self-assigned this Jul 27, 2022

This was referenced Jul 27, 2022

Rocksdb corruption issues on compaction #9009

Open

Add ledger tool command print-file-metadata #26790

Merged

(Ledger Tool) Add fully-compact-target under the copy command #26857

Closed

steviez mentioned this issue Jan 15, 2023

validator 1.13.5 keep crashing #29714

Closed

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Sep 4, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tools that help recover corrupted ledger store file #26813

Tools that help recover corrupted ledger store file #26813

yhchiang-sol commented Jul 27, 2022

yhchiang-sol commented Aug 2, 2022

steviez commented Aug 31, 2022 •

edited

Loading

steviez commented Aug 31, 2022

steviez commented Aug 31, 2022 •

edited

Loading

Tools that help recover corrupted ledger store file #26813

Tools that help recover corrupted ledger store file #26813

Comments

yhchiang-sol commented Jul 27, 2022

Problem

Proposed Solution

ledger-tool based solution

lower level solution

yhchiang-sol commented Aug 2, 2022

steviez commented Aug 31, 2022 • edited Loading

steviez commented Aug 31, 2022

steviez commented Aug 31, 2022 • edited Loading

steviez commented Aug 31, 2022 •

edited

Loading

steviez commented Aug 31, 2022 •

edited

Loading