Skip to content
This repository has been archived by the owner on Oct 28, 2021. It is now read-only.

RocksDB performs significantly worse than LevelDB in tests #5340

Closed
halfalicious opened this issue Nov 5, 2018 · 15 comments
Closed

RocksDB performs significantly worse than LevelDB in tests #5340

halfalicious opened this issue Nov 5, 2018 · 15 comments
Assignees
Labels
database Database-related work

Comments

@halfalicious
Copy link
Contributor

RocksDB support is being added in this PR: #4844

When running the Aleth tests, RocksDB currently performs significantly worse than LevelDB. For example, a full test run using RocksDB (and TestEth compiled with the RelWithDebInfo configuration) takes approximately 70% longer than when using LevelDB (timing performed via TestEth's --exectimelog argument). From the PR:

@gumb0 / @chfast : I've successfully sync'd a few thousand Ropsten blocks on both Windows and Linux so I think these changes are ready for review. One note - the tests took approximately 70% longer to run when using RocksDB so it looks like RocksDB has worse performance, at least for the read/write patterns used in the tests. I didn't quiet my machine before running the tests so these aren't scientific performance comparisons but I didn't have anything intensive running in the background and I didn't use my machine while the tests were being executed so I think the results are somewhat accurate. Regarding hard numbers, I ran 2 iterations of the tests using LevelDB (results: 982 and 1019) and 2 iterations using RocksDB (results: 1738 and 1769).

I took a quick look at the RocksDB interface that Aleth uses to interact with RocksDB and noticed a couple of small differences compared to the LevelDB interface:
RocksDBWriteBatch::insert: The status is checked after m_writeBatch.put(). No status check is performed in LevelDBWriteBatch::insert
RocksDBWriteBatch::kill: The status is checked after m_writeBatch.delete(). No status check is performed in LevelDBWriteBatch::kill
RocksDB::exists: The status is checked via IsNotFound() before the call to checkStatus(). This check is not performed in LevelDB::exists.

However, these differences aren't causing any noticeable performance issues since I removed the code from RocksDB and re-ran the tests but the perf delta remained.

Note that for some tests, the perf delta can be 100+%. For example:

  • BlockchainTests/bcStateTests:
    - LevelDB: 21.4428
    - RocksDB: 44.3357
    - Delta: 106%
  • BlockchainTests/bcBlockGasLimitTest:
    - LevelDB: 8.98981
    - RocksDB: 18.1722
    - Delta: 102%

Output from test runs:

I've done some brief searching online and haven't found anything indicating that RocksDB performance is expected to be significantly worse than LevelDB performance. As such, these significant performance differences are not expected and should be investigated.

@halfalicious halfalicious self-assigned this Nov 5, 2018
@halfalicious halfalicious added the database Database-related work label Nov 5, 2018
@chfast
Copy link
Member

chfast commented Nov 6, 2018

Hey.
First thing to try would be to build with LITE option. https://github.com/facebook/rocksdb/blob/master/ROCKSDB_LITE.md
I don't think we need any fancy server features.

The ROCKSDB_LITE should be applied in Hunter config like here: https://github.com/ethereum/aleth/blob/master/cmake/Hunter/config.cmake#L4

Secondly, parity has found good config options for rocksdb. You can either find them in parity's code or ask them on some chat channel. Options: https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning.

@chfast
Copy link
Member

chfast commented Nov 6, 2018

I've found this PR: openethereum/parity-ethereum#7348.

@halfalicious
Copy link
Contributor Author

Hey.
First thing to try would be to build with LITE option. https://github.com/facebook/rocksdb/blob/master/ROCKSDB_LITE.md
I don't think we need any fancy server features.

The ROCKSDB_LITE should be applied in Hunter config like here: https://github.com/ethereum/aleth/blob/master/cmake/Hunter/config.cmake#L4

Secondly, parity has found good config options for rocksdb. You can either find them in parity's code or ask them on some chat channel. Options: https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning.

We chatted about this offline but just to call this out here for posterity - ROCKSDB_LITE doesn't work, I hit the following error when running the blockchain state tests:

C:\Users\nilse\Documents\Code\aleth_ref\build\test\RelWithDebInfo>testeth -t BlockchainTests/bcStateTests -- --exectimelog --db rocksdb
Running tests using path: "C:\Users\nilse\Documents\Code\aleth\test\jsontests"
Running 1 test case...
Test Case "bcStateTests":
unknown location(0): fatal error: in "BlockchainTests/bcStateTests": class std::length_error: vector<T> too long

I also tried setting simple database config options based on what Parity has in the PR you shared, here's the relevant code file (from the PR since the file appears to have been removed from the repo): https://github.com/paritytech/parity-ethereum/blob/27564e6672285642f02e7a287f91b8de096d50dd/util/kvdb-rocksdb/src/lib.rs

Here are the database options I set:

  • options.use_fsync = false;
  • options.keep_log_file_num = 1;
  • options.bytes_per_sync = 1048576;
  • options.db_write_buffer_size = 128 * 1024 * 1024 / 2;
  • options.IncreaseParallelism(std::max(1, (int)std::thread::hardware_concurrency() / 2));
  • options.create_if_missing = true;
  • options.max_open_files = 512;

Setting these options had no impact on performance when running the bcStateTests test. I also tried setting the verify_checksums=false read option and disable_wal=true write option but those actually decreased performance (they increased the average runtime of the bcStateTests test approximately 10%).

Next I'm going to try playing with the more complicated options e.g. BlockBasedTableOptions, compaction profiles, and column options.

@gumb0
Copy link
Member

gumb0 commented Nov 12, 2018

On Parity's DB tuning there's also this doc https://gist.github.com/andresilva/a71c251995ef97806715441074a10f57#file-rocksdb-tuning-org

@halfalicious
Copy link
Contributor Author

halfalicious commented Nov 14, 2018

On Parity's DB tuning there's also this doc https://gist.github.com/andresilva/a71c251995ef97806715441074a10f57#file-rocksdb-tuning-org

Thanks @gumb0 ! I set those options but the test BlockchainTests/bcStateTests hits the following error when run with RocksDB:

C:\Users\nilse\Documents\Code\aleth\build\test\RelWithDebInfo>testeth -t BlockchainTests/bcStateTests -- --exectimelog --db rocksdb
Running tests using path: "C:\Users\nilse\Documents\Code\aleth\test\jsontests"
Running 1 test case...
Test Case "bcStateTests":
C:\Users\nilse\Documents\Code\aleth\libethereum\State.cpp(99): fatal error: in "class dev::OverlayDB __cdecl dev::eth::State::openDB(const class boost::filesystem::path &,const class dev::FixedHash<32> &,enum dev::WithExisting)": class boost::exception_detail::clone_impl<struct dev::eth::DatabaseAlreadyOpen>: C:\Users\nilse\Documents\Code\aleth\libethereum\State.cpp(99): Throw in function class dev::OverlayDB __cdecl dev::eth::State::openDB(const class boost::filesystem::path &,const class dev::FixedHash<32> &,enum dev::WithExisting)
Dynamic exception type: class boost::exception_detail::clone_impl<struct dev::eth::DatabaseAlreadyOpen>

C:\Users\nilse\Documents\Code\aleth\test\tools\jsontests\BlockChainTests.cpp(152): last checkpoint

*** 1 failure is detected (5 failures are expected) in the test module "Master Test Suite"
Total Time:                                       : 0

I'll do some debugging to see if I can figure out what the issue is...I don't think the exception error is correct, DatabaseAlreadyOpen is thrown in a fallback case (on line 99 in State.cpp):

try
{
std::unique_ptr<db::DatabaseFace> db = db::DBFactory::create(path / fs::path("state"));
clog(VerbosityTrace, "statedb") << "Opened state DB.";
return OverlayDB(std::move(db));
}
catch (boost::exception const& ex)
{
cwarn << boost::diagnostic_information(ex) << '\n';
if (!db::isDiskDatabase())
throw;
else if (fs::space(path / fs::path("state")).available < 1024)
{
cwarn << "Not enough available space found on hard drive. Please free some up and then re-run. Bailing.";
BOOST_THROW_EXCEPTION(NotEnoughAvailableSpace());
}
else
{
cwarn <<
"Database " <<
(path / fs::path("state")) <<
"already open. You appear to have another instance of ethereum running. Bailing.";
BOOST_THROW_EXCEPTION(DatabaseAlreadyOpen());
}

@gumb0
Copy link
Member

gumb0 commented Nov 14, 2018

Maybe you can see the message from line 85 if you increase verbosity of testeth run

@halfalicious
Copy link
Contributor Author

halfalicious commented Nov 15, 2018

Maybe you can see the message from line 85 if you increase verbosity of testeth run

Thanks, setting --verbosity to 3 did the trick and I figured it out.

Unfortunately RocksDB is still performing more than 2x slower (for BlockchainTests/bcStateTests) than LevelDB...I'm going to take a closer look at the meaning of each of the tuning options to see if I can come up with better values for them.

@halfalicious
Copy link
Contributor Author

Maybe you can see the message from line 85 if you increase verbosity of testeth run

Thanks, setting --verbosity to 3 did the trick and I figured it out.

Unfortunately RocksDB is still performing more than 2x slower (for BlockchainTests/bcStateTests) than LevelDB...I'm going to take a closer look at the meaning of each of the tuning options to see if I can come up with better values for them.

Since digging into the tuning options is going to take a bit of time, I first tried to set the exact same defaults as LevelDB...this involved setting:

  • write_buffer_size to 4MB (which is significantly less than RocksDB's default of 64MB)
  • paranoid_checks: false
  • max_open_files: 256 (RocksDB keeps all files open by default)
  • (read option) verify_checksums: false

The results were pretty much the same as our current settings (which is just max_open_files: 256)...here are the timing results (average of 3 runs of BlockchainTests/bcStateTests using a testeth Release build:

  • MemoryDB: 1.73124s
  • LevelDB: 18.8866s
  • RocksDB (LevelDB defaults): 39.514s
  • RocksDB (current defaults): 39.956s

Next I'm going to throw together some microbenchmarks (e.g. insert 1000 blocks into the database directly and read back the same 1000 blocks) to determine what sorts of perf deltas we're seeing for reads vs writes, since this will help guide my optimizations.

@halfalicious
Copy link
Contributor Author

halfalicious commented Nov 20, 2018

I created a simple test which inserts 1000 blocks (each containing 1 tx) directly into the database and reads back the blocks. I timed the insertions and reads separately (using std::chrono) and I’m seeing RocksDB take approx 50 percent longer for the reads and approx 2.5x longer for the writes.

@gumb0
Copy link
Member

gumb0 commented Nov 20, 2018

@halfalicious Could you share this test? I would try it on my machines

@chfast
Copy link
Member

chfast commented Nov 20, 2018

I cannot help with this, but if you want to switch to rocksdb on Windows to be able to upgrade to VS 2017, I'm ok with this.

@halfalicious
Copy link
Contributor Author

halfalicious commented Nov 21, 2018

@halfalicious Could you share this test? I would try it on my machines

Sure, I've re-written it so that the test runs 11 iterations of inserting 1000 blocks then reading 1000 blocks and computes the median of the write / read results. I'm seeing much smaller deltas when running this new test (on my Lenovo X1 Carbon laptop with SSD) built with the release config, approximately the same read latency (~2ms) and RocksDB has approx 50% worse write latency (~30ms vs ~28ms for LevelDB). These numbers are still very low and therefore susceptible to skew, I think I'm going to experiment with increasing the number of blocks and/or the number of txs in each block to see what the deltas are like with larger latencies.

Here's the commit: 3a0b326

Here's how you run the test / an example of the output:

C:\Users\nilse\Documents\Code\aleth\build\test\Release>testeth -t DBPerfTests -- --db rocksdb
Running tests using path: "C:\Users\nilse\Documents\Code\aleth\test\jsontests"
Running 1 test case...
Test Case "insertAndReadBlocks":
Generating 1000 test blocks...
.
.
Block generation complete!
Running 11 iterations...

Running iteration 1/11
Inserting 1000 blocks...
Insertion time: 32.3847
Reading 1000 blocks...
Read time: 2.0665

Running iteration 2/11
Inserting 1000 blocks...
Insertion time: 22.5895
Reading 1000 blocks...
Read time: 1.7972

Running iteration 3/11
Inserting 1000 blocks...
Insertion time: 20.9593
Reading 1000 blocks...
Read time: 1.5993

Running iteration 4/11
Inserting 1000 blocks...
Insertion time: 15.0339
Reading 1000 blocks...
Read time: 1.042

Running iteration 5/11
Inserting 1000 blocks...
Insertion time: 16.4825
Reading 1000 blocks...
Read time: 1.3751

Running iteration 6/11
Inserting 1000 blocks...
Insertion time: 17.2898
Reading 1000 blocks...
Read time: 2.2798

Running iteration 7/11
Inserting 1000 blocks...
Insertion time: 15.3379
Reading 1000 blocks...
Read time: 1.0748

Running iteration 8/11
Inserting 1000 blocks...
Insertion time: 15.444
Reading 1000 blocks...
Read time: 0.91

Running iteration 9/11
Inserting 1000 blocks...
Insertion time: 12.9021
Reading 1000 blocks...
Read time: 1.7367

Running iteration 10/11
Inserting 1000 blocks...
Insertion time: 12.7869
Reading 1000 blocks...
Read time: 1.7078

Running iteration 11/11
Inserting 1000 blocks...
Insertion time: 12.9262
Reading 1000 blocks...
Read time: 0.9567

Insertion median: 15.444 ms
Read median: 1.5993 ms

*** No errors detected

@gumb0
Copy link
Member

gumb0 commented Nov 29, 2018

I'm getting the similar results on both Windows and Linux machine, or even worse, insertion is several times slower for RocksDB. While LevelDB performs very comparable to MemoryDB.

@chfast
Copy link
Member

chfast commented Dec 6, 2018

  1. We can start asking questions on GitHub, like WriteBatchWithIndex seems slow facebook/rocksdb#608.
  2. Compare with https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks#test-3-random-write.
  3. All is done in single thread, right?
  4. I also found this paper: https://brage.bibsys.no/xmlui/bitstream/handle/11250/2506148/19718_FULLTEXT.pdf?sequence=1

@halfalicious
Copy link
Contributor Author

Closing since we've removed RocksDB support

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
database Database-related work
Projects
None yet
Development

No branches or pull requests

3 participants