Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch on-disk database from LevelDB #483

Open
michaelsproul opened this issue Aug 2, 2019 · 5 comments
Open

Switch on-disk database from LevelDB #483

michaelsproul opened this issue Aug 2, 2019 · 5 comments
Labels
A2 database RFC Request for comment

Comments

@michaelsproul
Copy link
Member

Description

Currently we use LevelDB as our on-disk store, however, the Rust bindings for it are very minimally maintained and missing a few useful features like range queries, i.e. "give me all the values between key X and X + N". See: https://github.com/skade/leveldb

In contrast, the RocksDB bindings are much more actively maintained (updates within the last month), and support more features (including range queries). RocksDB is also meant to be a successor to LevelDB, and is used by Parity: https://github.com/paritytech/parity-ethereum/tree/master/parity/db

Alternatively, we could switch to something other than RocksDB. Suggestions welcome!

@michaelsproul michaelsproul added the RFC Request for comment label Aug 2, 2019
@paulhauner
Copy link
Member

I remember @AgeManning having some issues with RocksDB and I think I remember @kirk-baird saying that someone at Parity said they wished they used LevelDB instead of Rocks.

My understanding is that Rocks is a bit faster, however it is more susceptible to corruption. This understanding is pretty loose though, I might try and ask around :)

@jtomanik
Copy link

jtomanik commented Sep 3, 2019

Have you guys considered LMDB?

Monero team is using it successfully as well as Mozilla. Here are some considerations behind their choice:
https://monero.stackexchange.com/questions/702/why-did-monero-choose-lmdb-over-alternative-database-types/784

https://mozilla.github.io/firefox-browser-architecture/text/0015-rkv.html

https://mozilla.github.io/firefox-browser-architecture/text/0017-lmdb-vs-leveldb.html

Apart from that Mozilla maintains Rust bindings for LMDB https://github.com/mozilla/rkv

@michaelsproul
Copy link
Member Author

Thanks for the tip @jtomanik, we hadn't! LMDB does look like a strong contender, particularly with bindings maintained by Mozilla.

There's a lot going on with Lighthouse at the moment, but in a few weeks we should have time to do some benchmarks and try some alternatives.

@michaelsproul michaelsproul changed the title Switch on-disk database from LevelDB to RocksDB Switch on-disk database from LevelDB Sep 5, 2019
@ghost ghost added A2 database labels Sep 9, 2020
@michaelsproul
Copy link
Member Author

michaelsproul commented Sep 22, 2020

I think we should move forward with this before mainnet. LevelDB has multiple drawbacks:

  • Doesn't support read-write transactions (only write batches)
  • Doesn't support range queries Actually, it does
  • Doesn't run on Windows
  • Not actively maintained any more

I've been using LMDB in the slasher (#1567) and have been quite happy with its performance and features. It still might be worth giving RocksDB a try too.

Interesting to note is that TurboGeth uses LMDB, presumably for good reason. We may even want to take inspiration from some of their optimisations 👀 https://github.com/ledgerwatch/turbo-geth/blob/master/docs/programmers_guide/db_walkthrough.MD

@AgeManning
Copy link
Member

I can't recall the issues I've had with rocks in the past. I believe a lot of them came from compiling on various linux distros and OS's.

bors bot pushed a commit that referenced this issue Oct 23, 2020
## Issue Addressed

Closes #800
Closes #1713

## Proposed Changes

Implement the temporary state storage algorithm described in #800. Specifically:

* Add `DBColumn::BeaconStateTemporary`, for storing 0-length temporary marker values.
* Store intermediate states immediately as they are created, marked temporary. Delete the temporary flag if the block is processed successfully.
* Add a garbage collection process to delete leftover temporary states on start-up.
* Bump the database schema version to 2 so that a DB with temporary states can't accidentally be used with older versions of the software. The auto-migration is a no-op, but puts in place some infra that we can use for future migrations (e.g. #1784)

## Additional Info

There are two known race conditions, one potentially causing permanent faults (hopefully rare), and the other insignificant.

### Race 1: Permanent state marked temporary

EDIT: this has been fixed by the addition of a lock around the relevant critical section

There are 2 threads that are trying to store 2 different blocks that share some intermediate states (e.g. they both skip some slots from the current head). Consider this sequence of events:

1. Thread 1 checks if state `s` already exists, and seeing that it doesn't, prepares an atomic commit of `(s, s_temporary_flag)`.
2. Thread 2 does the same, but also gets as far as committing the state txn, finishing the processing of its block, and _deleting_ the temporary flag.
3. Thread 1 is (finally) scheduled again, and marks `s` as temporary with its transaction.
4.
    a) The process is killed, or thread 1's block fails verification and the temp flag is not deleted. This is a permanent failure! Any attempt to load state `s` will fail... hope it isn't on the main chain! Alternatively (4b) happens...
    b) Thread 1 finishes, and re-deletes the temporary flag. In this case the failure is transient, state `s` will disappear temporarily, but will come back once thread 1 finishes running.

I _hope_ that steps 1-3 only happen very rarely, and 4a even more rarely. It's hard to know

This once again begs the question of why we're using LevelDB (#483), when it clearly doesn't care about atomicity! A ham-fisted fix would be to wrap the hot and cold DBs in locks, which would bring us closer to how other DBs handle read-write transactions. E.g. [LMDB only allows one R/W transaction at a time](https://docs.rs/lmdb/0.8.0/lmdb/struct.Environment.html#method.begin_rw_txn).

### Race 2: Temporary state returned from `get_state`

I don't think this race really matters, but in `load_hot_state`, if another thread stores a state between when we call `load_state_temporary_flag` and when we call `load_hot_state_summary`, then we could end up returning that state even though it's only a temporary state. I can't think of any case where this would be relevant, and I suspect if it did come up, it would be safe/recoverable (having data is safer than _not_ having data).

This could be fixed by using a LevelDB read snapshot, but that would require substantial changes to how we read all our values, so I don't think it's worth it right now.
@paulhauner paulhauner added A2 and removed A2 labels Nov 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A2 database RFC Request for comment
Projects
None yet
Development

No branches or pull requests

4 participants