Skip to content

dhly-etc/rocksdb-benchmark

Repository files navigation

RocksDB Benchmark

This is an implementation experiment to determine how RocksDB performs under the type of workload it would see if it replaced the current storage engine, primary index, and secondary indexes in ArangoDB.

This is experimental code.

Benchmarking

The default executable we provide is rocksdb-benchmark. This will run a suite of pre-defined but configurable workloads. Each workload will have its work divided among a number of worker threads specified by the CLI parameter threadCount. All work will take place in the RocksDB database specified by the CLI parameter folder. The workloads are as follows.

  1. Insert: Insert a number of documents of the form
{
 _key: <string>
 value: <uint64>
 name: <string>
 timestamp: <UTC-date>
}

where the value field is a randomly chosen integer, the name field is the string representation of another randomly chosen integer, and the timestamp field is the timestamp at which it was inserted. The total number of documents inserted is configurable via the CLI parameter keyCount, and each integer in the interval [1, keyCount] will be represented as the _key value of a document (specifically the string representation thereof). The documents will be inserted in random order.

  1. LookupSingleRandom: Perform a number of document lookups controlled by the CLI parameter lookupCount. Each lookup will choose a random _key value in the space defined by Insert. This is a good way to test the random-access performance of RocksDB. This workload can be disabled by setting lookupCount to be 0.

  2. LookupSingleHotset: Perform a number of document lookups controlled by the CLI parameter lookupCount. Each lookup will choose a random _key value from a restricted keyspace. This keyspace is defined by the CLI parameter hotsetCount, and the resulting keyspace is the interval [1, hotsetCount]. This is a good way to test how RocksDB behaves with uneven access patterns. This workload can be disabled by setting either of lookupCount or hotsetCount to be 0.

  3. LookupRange: Perform a number of range queries, utilizing the secondary index, to retrieve all documents within contiguous timestamp intervals. Specifically, the range of timestamp values is partitioned evenly between all worker threads, roughly as follows.

x = max - min;
{[min, min+x], [min+x+1, min+2x], ..., [max-x+1, max]}

A given thread will then further subdivide its assigned interval [a,b] to generate queries as follows.

n = (keyCount / threadCount);
y = (b - a) / (n / 1000);
{[a, a+y], [a+y+1, a+2y], ... [b-y+1, b]}

Thus, the entire keyspace is covered by the set of intervals generated by all threads, and each individual query is expected to return roughly 1000 documents (given a fairly uniform distribution of timestamp values within the range. To prevent any single thread from doing too much work, we provide another CLI parameter, rangeLimit. Each thread will execute queries until either all queries are done or the total number of documents returned thus far exceeds rangeLimit. This is particularly useful in the case where threadCount == 1, and it is not desired to query the entire database. this workload can be disabled by setting rangeLimit to be 0.

  1. InsertBatch: Inserts an additional keyCount documents of the same form as those inserted by Insert. In order not to throw off the results of any of the lookup workloads, this workload is run last. The key difference between Insert and InsertBatch is that the latter will insert a number of documents in a single transaction, controlled by the CLI parameter transactionSize, committing them all together; the former uses a new transaction for each document.

Note that each workload executes a single type of query. Latencies on these queries are recorded using a q-digest such that we can report quantile statistics. Furthermore, the workload's master thread will periodically poll the resident memory, virtual memory, and disk space usage of the process and report quantile statistics on these as well.

The proper usage is as follows.

rocksdb-benchmark [threadCount] [keyCount] [lookupCount] [hotsetCount] [transactionSize] [rangeLimit] [folder]

Results are output in JSON. For convenience, we have provided helper script which will analyze all files in the results folder with the .json extension and output a Markdown document. Suggested usage is as follows.

nodejs scripts/parseResults.js > results/README.md

Internal Representation and Implementation

Since we are aiming to test the performance characteristics of RocksDB under an ArangoDB workload, we store values using the VelocyPack format. Storing all data in a single RocksDB instance requires that we partition the keyspace appropriately. We currently use three types of entries: slugs, documents and index values. We use placeholders <db-id>, <coll-id>, <idx-id>, and <rev-id>, to specify database, collection, index, and revision IDs respectively, all 64-bit integers, and <slug> to denote a 32-bit integer slug.

On start-up, a database and collection are created and a primary and secondary index are generated, all with random IDs. Each of these structures then has a slug generated to serve as a compact identifier. Revision IDs are generated using a hybrid logical clock.

Slug entries take one of two forms depending on the type: S<db-id><coll-id> -> <slug> or S<db-id><coll-id><idx-id> -> <slug>.

Document entries take the form d<slug><rev-id> -> {vpack-value} where

  • d is the ASCII byte for the letter 'd', and -{vpack-value} is the VelocyPack representation of the document.

Index entries take the form i<slug>{vpack-values}<key><rev_id> -> b, where

  • i is the ASCII byte for the letter 'i',
  • {vpack-values} is the VelocyPack representation of the indexed values, and
  • b is tombstone byte which is zero if the value is present and non-zero if it has been deleted. Entries in the primary index have no indexed values. These exist solely to map keys to revision IDs. Entries in the secondary index are indexed by timestamp.

All database operations happen transactionally, using pessimistic transactions. Most internal RocksDB settings are left at the defaults. The significant exceptions include a custom comparator (VelocyPack values are not lexicographically ordered) and prefix extractor.

Building

We use a few third-party submodules which must be initialized. Further more, two of these submodules must be built separately from our standard cmake step, and one of these requires a small change to its build process Something like the following would suffice.

git clone {REPOSITORY} {SRC_DIR}
cd {SRC_DIR}
git submodule init
cd 3rdParty/rocksdb
make shared_lib
cd ../velocypack
sed 's/add_library(velocypack STATIC/add_library(velocypack SHARED/g' CMakeLists.txt
cd build
cmake ..
make
cd ../q-digest
mkdir build
cd build
cmake ..
make
cd ../..
mkdir build
cd build
cmake ..
make

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published