-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add insert hints for each writebatch #5728
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code look good to me.
Is there any regression for random insert? Let's also try db_bench updaterandom benchmark and share the result here.
Please run make format
once.
include/rocksdb/options.h
Outdated
// If true, this writebatch will use its own insert hints in concurrent write | ||
// | ||
// Default: false | ||
bool hint_per_batch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which of memtable_insert_with_hint_prefix_extractor
and hint_per_batch
should take precedence? It seems to me it should be hint_per_batch
because its a per batch option which is more specific. And we should document the behavior in the inline comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we by default enable it if memtable_insert_with_hint_prefix_extractor
is set?
Extra option always makes it harder to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two options are not compatible with each other. memtable_insert_with_hint_prefix_extractor
preserve the hint across different write batches, and hint_per_batch
has nothing to do with prefixes.
But I'm wondering whether hint_per_batch
can always enable. Like for each write batch we keep the splice for the first key, then detect if its a sequential insert. If so, reuse the hint, otherwise discard the hint and start over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I only enable hint_per_batch
when concurrent_memtable_writes
is set to true, because in non-concurrent write either memtable_insert_with_hint_prefix_extractor
or seq_splice_
will be used, which i think will make no much difference in performance compared to hint_per_batch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some discussion we think having an extra hint_per_batch
write option is more flexible and better fit our needs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying any thought about this? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many more difficult features do you have? :) The general idea looks good tome.
db/write_batch.cc
Outdated
@@ -1222,6 +1223,10 @@ class MemTableInserter : public WriteBatch::Handler { | |||
DupDetector duplicate_detector_; | |||
bool dup_dectector_on_; | |||
|
|||
bool hint_per_batch_; | |||
// Hints for this batch | |||
std::unordered_map<MemTable*, void*> hint_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ever caused regression on Windows when introducing post map. The reason is that on Windows, those STL containers invoke a malloc even without inserting anything. That's why there is a very odd logic GetPostMap(). Just see the line blow. I'm not sure whether things have improved on Windows. To be safe, we can do exactly the same as mem_post_info_map_.
db/write_batch.cc
Outdated
@@ -1399,7 +1405,8 @@ class MemTableInserter : public WriteBatch::Handler { | |||
if (!moptions->inplace_update_support) { | |||
bool mem_res = | |||
mem->Add(sequence_, value_type, key, value, | |||
concurrent_memtable_writes_, get_post_process_info(mem)); | |||
concurrent_memtable_writes_, get_post_process_info(mem), | |||
hint_per_batch_?&hint_[mem]:nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting.
db/memtable.cc
Outdated
@@ -505,7 +506,8 @@ bool MemTable::Add(SequenceNumber s, ValueType type, | |||
return res; | |||
} | |||
} else { | |||
bool res = table->InsertKey(handle); | |||
bool res = (hint == nullptr)?table->InsertKey(handle): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting. I believe spaces are needed before and after "?", as well as ":".
include/rocksdb/options.h
Outdated
@@ -1338,6 +1338,11 @@ struct WriteOptions { | |||
// Default: false | |||
bool low_pri; | |||
|
|||
// If true, this writebatch will use its own insert hints in concurrent write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be a better idea to improve the comments.
include/rocksdb/options.h
Outdated
// If true, this writebatch will use its own insert hints in concurrent write | ||
// | ||
// Default: false | ||
bool hint_per_batch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we by default enable it if memtable_insert_with_hint_prefix_extractor
is set?
Extra option always makes it harder to maintain.
This is not from me. Its from @zhangjinpeng1987 and @Jing118 :) |
Here is the bench result for fillrandom (qps):
master:
with hint_per_batch set to true :
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test to db_memtable_test and inlineskiplist_test?
memtable/inlineskiplist.h
Outdated
assert(hint != nullptr); | ||
Splice* splice = reinterpret_cast<Splice*>(*hint); | ||
if (splice == nullptr) { | ||
splice = AllocateSplice(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllocateSplice
will use memtable's arena to allocate space. For write batch hint we should allocate the splice on stack.
include/rocksdb/options.h
Outdated
// If true, this writebatch will use its own insert hints in concurrent write | ||
// | ||
// Default: false | ||
bool hint_per_batch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some discussion we think having an extra hint_per_batch
write option is more flexible and better fit our needs.
memtable/inlineskiplist.h
Outdated
assert(hint != nullptr); | ||
Splice* splice = reinterpret_cast<Splice*>(*hint); | ||
if (splice == nullptr) { | ||
size_t array_size = sizeof(Node*) * (kMaxHeight_ + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract the logic to a AllocateSpliceOnHeap
method?
@yiwu-arbug is the PR ready? |
@siying yes, this one is ready for review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It mostly looks good to me.
@@ -1487,7 +1513,8 @@ class MemTableInserter : public WriteBatch::Handler { | |||
MemTable* mem = cf_mems_->GetMemTable(); | |||
bool mem_res = | |||
mem->Add(sequence_, delete_type, key, value, | |||
concurrent_memtable_writes_, get_post_process_info(mem)); | |||
concurrent_memtable_writes_, get_post_process_info(mem), | |||
hint_per_batch_ ? &GetHintMap()[mem] : nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm not knowledgable enough to C++. If hint map doesn't have "mem", does the value inserted guarantees to be nullptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the value will be value-initialized, so pointer will be initialized to nullptr.
include/rocksdb/options.h
Outdated
// option will be ignored. | ||
// | ||
// Default: false | ||
bool hint_per_batch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should call it memtable_insert_hint_per_batch
or something like that. Hint is too general for WriteOptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change it to memtable_insert_hint_per_batch
.
@Jing118 has updated the pull request. Re-import the pull request |
@Jing118 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed something last time. One more comment and we are good to go!
@@ -120,6 +120,20 @@ class MemTableRep { | |||
return true; | |||
} | |||
|
|||
// Same as ::InsertWithHint, but allow concurrnet write |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to document the ownership of hint. In my understanding, the caller will own the object hint, correct?
Or, if that is the case, an even better idea is to change the argument to std::unique_ptr<void>*
, so that is ownership is clear.
// Same as ::InsertWithHintConcurrently | ||
// Returns false if MemTableRepFactory::CanHandleDuplicatedKey() is true and | ||
// the <key, seq> already exists. | ||
virtual bool InsertKeyWithHintConcurrently(KeyHandle handle, void** hint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as here.
@Jing118 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed that HISTORY.md is not updated to introduce the feature. I know that because of the time difference the round-trip is long, so I don't want to hold the PR from being committed, but please send out another pull request to update it.
// hint later. | ||
// | ||
// Currently only skip-list based memtable implement the interface. Other | ||
// implementations will fallback to InsertConcurrently() by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to hold the feature from being merged because of this code style comment, but I do hope you go ahead and try whether std::unique_ptr<void>*
works in this case, so that the contract of the function is much cleaner.
@siying you can probably update HISTORY.md of this PR from the link: I think @Jing118 can look into using |
@yiwu-arbug it will be better if the authors update HISTORY.md. They will describe it in a more authentic way, and the code blaming is slightly easier. |
The above link will make change to this PR directly. I mean, in some case reviewer can make changes to the PR directly to avoid additional round trip. For this PR though, let's wait to see if @Jing118 can address the remaining comment. |
This pull request has been merged in 1a928c2. |
Summary: Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it. Bench result (qps): `./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4` master: | batch size \ thread num | 1 | 2 | 4 | 8 | | ----------------------- | ------- | ------- | ------- | ------- | | 1 | 387883 | 220790 | 308294 | 490998 | | 10 | 1397208 | 978911 | 1275684 | 1733395 | | 100 | 2045414 | 1589927 | 1798782 | 2681039 | | 1000 | 2228038 | 1698252 | 1839877 | 2863490 | fillseq with writebatch hint: | batch size \ thread num | 1 | 2 | 4 | 8 | | ----------------------- | ------- | ------- | ------- | ------- | | 1 | 286005 | 223570 | 300024 | 466981 | | 10 | 970374 | 813308 | 1399299 | 1753588 | | 100 | 1962768 | 1983023 | 2676577 | 3086426 | | 1000 | 2195853 | 2676782 | 3231048 | 3638143 | Pull Request resolved: facebook#5728 Differential Revision: D17297240 fbshipit-source-id: b053590a6d77871f1ef2f911a7bd013b3899b26c
Summary: Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it. Bench result (qps): `./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4` master: | batch size \ thread num | 1 | 2 | 4 | 8 | | ----------------------- | ------- | ------- | ------- | ------- | | 1 | 387883 | 220790 | 308294 | 490998 | | 10 | 1397208 | 978911 | 1275684 | 1733395 | | 100 | 2045414 | 1589927 | 1798782 | 2681039 | | 1000 | 2228038 | 1698252 | 1839877 | 2863490 | fillseq with writebatch hint: | batch size \ thread num | 1 | 2 | 4 | 8 | | ----------------------- | ------- | ------- | ------- | ------- | | 1 | 286005 | 223570 | 300024 | 466981 | | 10 | 970374 | 813308 | 1399299 | 1753588 | | 100 | 1962768 | 1983023 | 2676577 | 3086426 | | 1000 | 2195853 | 2676782 | 3231048 | 3638143 | Pull Request resolved: facebook#5728 Differential Revision: D17297240 fbshipit-source-id: b053590a6d77871f1ef2f911a7bd013b3899b26c
Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it.
Bench result (qps):
./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4
master:
fillseq with writebatch hint: