Switch on-disk database from LevelDB #483

michaelsproul · 2019-08-02T02:01:49Z

Description

Currently we use LevelDB as our on-disk store, however, the Rust bindings for it are very minimally maintained and missing a few useful features like range queries, i.e. "give me all the values between key X and X + N". See: https://github.com/skade/leveldb

In contrast, the RocksDB bindings are much more actively maintained (updates within the last month), and support more features (including range queries). RocksDB is also meant to be a successor to LevelDB, and is used by Parity: https://github.com/paritytech/parity-ethereum/tree/master/parity/db

Alternatively, we could switch to something other than RocksDB. Suggestions welcome!

paulhauner · 2019-08-02T02:10:21Z

I remember @AgeManning having some issues with RocksDB and I think I remember @kirk-baird saying that someone at Parity said they wished they used LevelDB instead of Rocks.

My understanding is that Rocks is a bit faster, however it is more susceptible to corruption. This understanding is pretty loose though, I might try and ask around :)

jtomanik · 2019-09-03T14:00:48Z

Have you guys considered LMDB?

Monero team is using it successfully as well as Mozilla. Here are some considerations behind their choice:
https://monero.stackexchange.com/questions/702/why-did-monero-choose-lmdb-over-alternative-database-types/784

https://mozilla.github.io/firefox-browser-architecture/text/0015-rkv.html

https://mozilla.github.io/firefox-browser-architecture/text/0017-lmdb-vs-leveldb.html

Apart from that Mozilla maintains Rust bindings for LMDB https://github.com/mozilla/rkv

michaelsproul · 2019-09-05T06:55:22Z

Thanks for the tip @jtomanik, we hadn't! LMDB does look like a strong contender, particularly with bindings maintained by Mozilla.

There's a lot going on with Lighthouse at the moment, but in a few weeks we should have time to do some benchmarks and try some alternatives.

michaelsproul · 2020-09-22T00:06:50Z

I think we should move forward with this before mainnet. LevelDB has multiple drawbacks:

Doesn't support read-write transactions (only write batches)
~~Doesn't support range queries~~ Actually, it does
Doesn't run on Windows
Not actively maintained any more

I've been using LMDB in the slasher (#1567) and have been quite happy with its performance and features. It still might be worth giving RocksDB a try too.

Interesting to note is that TurboGeth uses LMDB, presumably for good reason. We may even want to take inspiration from some of their optimisations 👀 https://github.com/ledgerwatch/turbo-geth/blob/master/docs/programmers_guide/db_walkthrough.MD

AgeManning · 2020-09-22T01:23:50Z

I can't recall the issues I've had with rocks in the past. I believe a lot of them came from compiling on various linux distros and OS's.

## Issue Addressed Closes #800 Closes #1713 ## Proposed Changes Implement the temporary state storage algorithm described in #800. Specifically: * Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values. * Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully. * Add a garbage collection process to delete leftover temporary states on start-up. * Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784) ## Additional Info There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant. ### Race 1: Permanent state marked temporary EDIT: this has been fixed by the addition of a lock around the relevant critical section There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events: 1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`. 2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag. 3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction. 4. a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens... b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running. I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn). ### Race 2: Temporary state returned from `get_state` I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data). This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.

michaelsproul added the RFC Request for comment label Aug 2, 2019

michaelsproul mentioned this issue Aug 2, 2019

Optimise state storage with Chunked Ranges™ #484

Closed

michaelsproul changed the title ~~Switch on-disk database from LevelDB to RocksDB~~ Switch on-disk database from LevelDB Sep 5, 2019

ghost added A2 database labels Sep 9, 2020

michaelsproul mentioned this issue Sep 14, 2020

Head tracker has unsafe API wrt concurrency #1557

Closed

michaelsproul mentioned this issue Sep 21, 2020

Native Windows Support #1237

Closed

4 tasks

michaelsproul mentioned this issue Oct 21, 2020

[Merged by Bors] - Implement database temp states to reduce memory usage #1798

Closed

paulhauner added A2 and removed A2 labels Nov 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch on-disk database from LevelDB #483

Switch on-disk database from LevelDB #483

michaelsproul commented Aug 2, 2019

paulhauner commented Aug 2, 2019

jtomanik commented Sep 3, 2019

michaelsproul commented Sep 5, 2019

michaelsproul commented Sep 22, 2020 •

edited

Loading

AgeManning commented Sep 22, 2020

Switch on-disk database from LevelDB #483

Switch on-disk database from LevelDB #483

Comments

michaelsproul commented Aug 2, 2019

Description

paulhauner commented Aug 2, 2019

jtomanik commented Sep 3, 2019

michaelsproul commented Sep 5, 2019

michaelsproul commented Sep 22, 2020 • edited Loading

AgeManning commented Sep 22, 2020

michaelsproul commented Sep 22, 2020 •

edited

Loading