-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a benchmarking wrapper script for BlobDB #9015
Closed
Closed
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
c3ed10f
This should do the trick
ltamasi b97af38
Reduce target_file_size_base and max_bytes_for_level_base only if blo…
ltamasi bc0f8db
Add run_blob_bench.sh
ltamasi a2c4ed1
Delete output_dir before starting
ltamasi e9d0b2e
Put DB under /data, increase DB and memtable size, test more/bigger v…
ltamasi 783026b
Update duration for read and read/write tests
ltamasi c3e18c0
Enable blob GC iff blob files are enabled (just in case)
ltamasi 8acb50c
Update comment
ltamasi 1c9126e
Make the GC cutoff for bulk loading configurable
ltamasi c5ea5df
Add the GC cutoff for bulk loading to output_dir
ltamasi 59678a0
Go back to no GC during bulkload's compaction, try different cutoffs …
ltamasi e2d0656
Put back Merge operators since they are now supported with BlobDB
ltamasi c9c27bf
Revert some parameter tuning that was done to take advantage of the t…
ltamasi 90530ac
Clean up/refactor run_blob_bench.sh, move some settings over to it fr…
ltamasi 8d85ca3
Update comment and make it possible to configure some more parameters…
ltamasi cb8869e
Whitespace fix
ltamasi 1335ba9
Update head comment, add help
ltamasi 0cbf447
Adjust default for target file size base, small cleanup
ltamasi 95fbc76
Dump benchmark setup
ltamasi 31a05c6
Ditch report.txt and report2.txt in favor of report.tsv (which is gen…
ltamasi bc78904
Unset DURATION for the write-only phase
ltamasi 8c6b943
Make OUTPUT_DIR mandatory
ltamasi 14bf81d
Add support for JOB_ID
ltamasi 37f4d38
No need to create output directory, benchmark.sh will take care of it
ltamasi c5dc17d
Add comment
ltamasi 0288c91
Add blob_garbage_collection_force_threshold
ltamasi 7f3f54d
Address some lints
ltamasi 8eac99d
More lint
ltamasi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,195 @@ | ||
#!/usr/bin/env bash | ||
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved. | ||
# | ||
# BlobDB benchmark script | ||
# | ||
# REQUIRES: benchmark.sh is in the tools subdirectory | ||
# | ||
# After the execution of this script, log files are available in $output_dir. | ||
# report.tsv provides high level statistics. | ||
# | ||
# Should be run from the parent of the tools directory. The command line is: | ||
# [$env_vars] tools/run_blob_bench.sh | ||
# | ||
# This runs the following sequence of BlobDB performance tests: | ||
# phase 1) write-only - bulkload+compact, overwrite+waitforcompaction | ||
# phase 2) read-write - readwhilewriting, fwdrangewhilewriting | ||
# phase 3) read-only - readrandom, fwdrange | ||
# | ||
|
||
# Exit Codes | ||
EXIT_INVALID_ARGS=1 | ||
|
||
# Size constants | ||
K=1024 | ||
M=$((1024 * K)) | ||
G=$((1024 * M)) | ||
T=$((1024 * G)) | ||
|
||
function display_usage() { | ||
echo "usage: run_blob_bench.sh [--help]" | ||
echo "" | ||
echo "Runs the following sequence of BlobDB benchmark tests using tools/benchmark.sh:" | ||
echo -e "\tPhase 1: write-only tests: bulkload+compact, overwrite+waitforcompaction" | ||
echo -e "\tPhase 2: read-write tests: readwhilewriting, fwdrangewhilewriting" | ||
echo -e "\tPhase 3: read-only tests: readrandom, fwdrange" | ||
echo "" | ||
echo "Environment Variables:" | ||
echo -e "\tJOB_ID\t\t\t\tIdentifier for the benchmark job, will appear in the results (default: empty)" | ||
echo -e "\tDB_DIR\t\t\t\tPath for the RocksDB data directory (mandatory)" | ||
echo -e "\tWAL_DIR\t\t\t\tPath for the RocksDB WAL directory (mandatory)" | ||
echo -e "\tOUTPUT_DIR\t\t\tPath for the benchmark results (mandatory)" | ||
echo -e "\tNUM_THREADS\t\t\tNumber of threads (default: 16)" | ||
echo -e "\tCOMPRESSION_TYPE\t\tCompression type for the SST files (default: lz4)" | ||
echo -e "\tDB_SIZE\t\t\t\tRaw (uncompressed) database size (default: 1 TB)" | ||
echo -e "\tVALUE_SIZE\t\t\tValue size (default: 1 KB)" | ||
echo -e "\tNUM_KEYS\t\t\tNumber of keys (default: raw database size divided by value size)" | ||
echo -e "\tDURATION\t\t\tIndividual duration for read-write/read-only tests in seconds (default: 1800)" | ||
echo -e "\tWRITE_BUFFER_SIZE\t\tWrite buffer (memtable) size (default: 1 GB)" | ||
echo -e "\tENABLE_BLOB_FILES\t\tEnable blob files (default: 1)" | ||
echo -e "\tMIN_BLOB_SIZE\t\t\tSize threshold for storing values in blob files (default: 0)" | ||
echo -e "\tBLOB_FILE_SIZE\t\t\tBlob file size (default: same as write buffer size)" | ||
echo -e "\tBLOB_COMPRESSION_TYPE\t\tCompression type for the blob files (default: lz4)" | ||
echo -e "\tENABLE_BLOB_GC\t\t\tEnable blob garbage collection (default: 1)" | ||
echo -e "\tBLOB_GC_AGE_CUTOFF\t\tBlob garbage collection age cutoff (default: 0.25)" | ||
echo -e "\tBLOB_GC_FORCE_THRESHOLD\t\tThreshold for forcing garbage collection of the oldest blob files (default: 1.0)" | ||
echo -e "\tTARGET_FILE_SIZE_BASE\t\tTarget SST file size for compactions (default: write buffer size, scaled down if blob files are enabled)" | ||
echo -e "\tMAX_BYTES_FOR_LEVEL_BASE\tMaximum size for the base level (default: 8 * target SST file size)" | ||
} | ||
|
||
if [ $# -ge 1 ]; then | ||
display_usage | ||
|
||
if [ "$1" == "--help" ]; then | ||
exit | ||
else | ||
exit $EXIT_INVALID_ARGS | ||
fi | ||
fi | ||
|
||
# shellcheck disable=SC2153 | ||
if [ -z "$DB_DIR" ]; then | ||
echo "DB_DIR is not defined" | ||
exit $EXIT_INVALID_ARGS | ||
fi | ||
|
||
# shellcheck disable=SC2153 | ||
if [ -z "$WAL_DIR" ]; then | ||
echo "WAL_DIR is not defined" | ||
exit $EXIT_INVALID_ARGS | ||
fi | ||
|
||
# shellcheck disable=SC2153 | ||
if [ -z "$OUTPUT_DIR" ]; then | ||
echo "OUTPUT_DIR is not defined" | ||
exit $EXIT_INVALID_ARGS | ||
fi | ||
|
||
# shellcheck disable=SC2153 | ||
job_id=$JOB_ID | ||
|
||
db_dir=$DB_DIR | ||
wal_dir=$WAL_DIR | ||
output_dir=$OUTPUT_DIR | ||
|
||
num_threads=${NUM_THREADS:-16} | ||
|
||
compression_type=${COMPRESSION_TYPE:-lz4} | ||
|
||
db_size=${DB_SIZE:-$((1 * T))} | ||
value_size=${VALUE_SIZE:-$((1 * K))} | ||
num_keys=${NUM_KEYS:-$((db_size / value_size))} | ||
|
||
duration=${DURATION:-1800} | ||
|
||
write_buffer_size=${WRITE_BUFFER_SIZE:-$((1 * G))} | ||
|
||
enable_blob_files=${ENABLE_BLOB_FILES:-1} | ||
min_blob_size=${MIN_BLOB_SIZE:-0} | ||
blob_file_size=${BLOB_FILE_SIZE:-$write_buffer_size} | ||
blob_compression_type=${BLOB_COMPRESSION_TYPE:-lz4} | ||
enable_blob_garbage_collection=${ENABLE_BLOB_GC:-1} | ||
blob_garbage_collection_age_cutoff=${BLOB_GC_AGE_CUTOFF:-0.25} | ||
blob_garbage_collection_force_threshold=${BLOB_GC_FORCE_THRESHOLD:-1.0} | ||
|
||
if [ "$enable_blob_files" == "1" ]; then | ||
target_file_size_base=${TARGET_FILE_SIZE_BASE:-$((32 * write_buffer_size / value_size))} | ||
else | ||
target_file_size_base=${TARGET_FILE_SIZE_BASE:-$write_buffer_size} | ||
fi | ||
|
||
max_bytes_for_level_base=${MAX_BYTES_FOR_LEVEL_BASE:-$((8 * target_file_size_base))} | ||
|
||
echo "======================== Benchmark setup ========================" | ||
echo -e "Job ID:\t\t\t\t\t$job_id" | ||
echo -e "Data directory:\t\t\t\t$db_dir" | ||
echo -e "WAL directory:\t\t\t\t$wal_dir" | ||
echo -e "Output directory:\t\t\t$output_dir" | ||
echo -e "Number of threads:\t\t\t$num_threads" | ||
echo -e "Compression type for SST files:\t\t$compression_type" | ||
echo -e "Raw database size:\t\t\t$db_size" | ||
echo -e "Value size:\t\t\t\t$value_size" | ||
echo -e "Number of keys:\t\t\t\t$num_keys" | ||
echo -e "Duration of read-write/read-only tests:\t$duration" | ||
echo -e "Write buffer size:\t\t\t$write_buffer_size" | ||
echo -e "Blob files enabled:\t\t\t$enable_blob_files" | ||
echo -e "Blob size threshold:\t\t\t$min_blob_size" | ||
echo -e "Blob file size:\t\t\t\t$blob_file_size" | ||
echo -e "Compression type for blob files:\t$blob_compression_type" | ||
echo -e "Blob GC enabled:\t\t\t$enable_blob_garbage_collection" | ||
echo -e "Blob GC age cutoff:\t\t\t$blob_garbage_collection_age_cutoff" | ||
echo -e "Blob GC force threshold:\t\t$blob_garbage_collection_force_threshold" | ||
echo -e "Target SST file size:\t\t\t$target_file_size_base" | ||
echo -e "Maximum size of base level:\t\t$max_bytes_for_level_base" | ||
echo "=================================================================" | ||
|
||
rm -rf "$db_dir" | ||
rm -rf "$wal_dir" | ||
rm -rf "$output_dir" | ||
|
||
ENV_VARS="\ | ||
JOB_ID=$job_id \ | ||
DB_DIR=$db_dir \ | ||
WAL_DIR=$wal_dir \ | ||
OUTPUT_DIR=$output_dir \ | ||
NUM_THREADS=$num_threads \ | ||
COMPRESSION_TYPE=$compression_type \ | ||
VALUE_SIZE=$value_size \ | ||
NUM_KEYS=$num_keys" | ||
|
||
ENV_VARS_D="$ENV_VARS DURATION=$duration" | ||
|
||
PARAMS="\ | ||
--enable_blob_files=$enable_blob_files \ | ||
--min_blob_size=$min_blob_size \ | ||
--blob_file_size=$blob_file_size \ | ||
--blob_compression_type=$blob_compression_type \ | ||
--write_buffer_size=$write_buffer_size \ | ||
--target_file_size_base=$target_file_size_base \ | ||
--max_bytes_for_level_base=$max_bytes_for_level_base" | ||
|
||
PARAMS_GC="$PARAMS \ | ||
--enable_blob_garbage_collection=$enable_blob_garbage_collection \ | ||
--blob_garbage_collection_age_cutoff=$blob_garbage_collection_age_cutoff \ | ||
--blob_garbage_collection_force_threshold=$blob_garbage_collection_force_threshold" | ||
|
||
# bulk load (using fillrandom) + compact | ||
env -u DURATION -S "$ENV_VARS" ./tools/benchmark.sh bulkload "$PARAMS" | ||
|
||
# overwrite + waitforcompaction | ||
env -u DURATION -S "$ENV_VARS" ./tools/benchmark.sh overwrite "$PARAMS_GC" | ||
|
||
# readwhilewriting | ||
env -S "$ENV_VARS_D" ./tools/benchmark.sh readwhilewriting "$PARAMS_GC" | ||
|
||
# fwdrangewhilewriting | ||
env -S "$ENV_VARS_D" ./tools/benchmark.sh fwdrangewhilewriting "$PARAMS_GC" | ||
|
||
# readrandom | ||
env -S "$ENV_VARS_D" ./tools/benchmark.sh readrandom "$PARAMS_GC" | ||
|
||
# fwdrange | ||
env -S "$ENV_VARS_D" ./tools/benchmark.sh fwdrange "$PARAMS_GC" | ||
|
||
# save logs to output directory | ||
cp "$db_dir"/LOG* "$output_dir/" |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a shell expert. Would it be possible to use
getopts
for argument parsing here? Maybe it's not worth the effort for this script.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that would a bit of an overkill considering we only have a single optional command line argument...