Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add insert hints for each writebatch #5728

Closed
wants to merge 15 commits into from
Closed

Add insert hints for each writebatch #5728

wants to merge 15 commits into from

Conversation

Jing118
Copy link
Contributor

@Jing118 Jing118 commented Aug 22, 2019

Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it.

Bench result (qps):

./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4

master:

batch size \ thread num 1 2 4 8
1 387883 220790 308294 490998
10 1397208 978911 1275684 1733395
100 2045414 1589927 1798782 2681039
1000 2228038 1698252 1839877 2863490

fillseq with writebatch hint:

batch size \ thread num 1 2 4 8
1 286005 223570 300024 466981
10 970374 813308 1399299 1753588
100 1962768 1983023 2676577 3086426
1000 2195853 2676782 3231048 3638143

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code look good to me.

Is there any regression for random insert? Let's also try db_bench updaterandom benchmark and share the result here.

Please run make format once.

// If true, this writebatch will use its own insert hints in concurrent write
//
// Default: false
bool hint_per_batch;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which of memtable_insert_with_hint_prefix_extractor and hint_per_batch should take precedence? It seems to me it should be hint_per_batch because its a per batch option which is more specific. And we should document the behavior in the inline comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we by default enable it if memtable_insert_with_hint_prefix_extractor is set?
Extra option always makes it harder to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two options are not compatible with each other. memtable_insert_with_hint_prefix_extractor preserve the hint across different write batches, and hint_per_batch has nothing to do with prefixes.

But I'm wondering whether hint_per_batch can always enable. Like for each write batch we keep the splice for the first key, then detect if its a sequential insert. If so, reuse the hint, otherwise discard the hint and start over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I only enable hint_per_batch when concurrent_memtable_writes is set to true, because in non-concurrent write either memtable_insert_with_hint_prefix_extractor or seq_splice_ will be used, which i think will make no much difference in performance compared to hint_per_batch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion we think having an extra hint_per_batch write option is more flexible and better fit our needs.

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying any thought about this? Thanks.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many more difficult features do you have? :) The general idea looks good tome.

@@ -1222,6 +1223,10 @@ class MemTableInserter : public WriteBatch::Handler {
DupDetector duplicate_detector_;
bool dup_dectector_on_;

bool hint_per_batch_;
// Hints for this batch
std::unordered_map<MemTable*, void*> hint_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ever caused regression on Windows when introducing post map. The reason is that on Windows, those STL containers invoke a malloc even without inserting anything. That's why there is a very odd logic GetPostMap(). Just see the line blow. I'm not sure whether things have improved on Windows. To be safe, we can do exactly the same as mem_post_info_map_.

@@ -1399,7 +1405,8 @@ class MemTableInserter : public WriteBatch::Handler {
if (!moptions->inplace_update_support) {
bool mem_res =
mem->Add(sequence_, value_type, key, value,
concurrent_memtable_writes_, get_post_process_info(mem));
concurrent_memtable_writes_, get_post_process_info(mem),
hint_per_batch_?&hint_[mem]:nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting.

db/memtable.cc Outdated
@@ -505,7 +506,8 @@ bool MemTable::Add(SequenceNumber s, ValueType type,
return res;
}
} else {
bool res = table->InsertKey(handle);
bool res = (hint == nullptr)?table->InsertKey(handle):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting. I believe spaces are needed before and after "?", as well as ":".

@@ -1338,6 +1338,11 @@ struct WriteOptions {
// Default: false
bool low_pri;

// If true, this writebatch will use its own insert hints in concurrent write
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be a better idea to improve the comments.

// If true, this writebatch will use its own insert hints in concurrent write
//
// Default: false
bool hint_per_batch;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we by default enable it if memtable_insert_with_hint_prefix_extractor is set?
Extra option always makes it harder to maintain.

@yiwu-arbug
Copy link
Contributor

How many more difficult features do you have? :) The general idea looks good tome.

This is not from me. Its from @zhangjinpeng1987 and @Jing118 :)

@Jing118
Copy link
Contributor Author

Jing118 commented Aug 24, 2019

Here is the bench result for fillrandom (qps):

./db_bench --benchmarks=fillrandom -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4

master:

batch size \ thread num 1 2 4 8
1 214724 208672 272630 408532
10 397511 333317 547891 938795
100 439467 346279 581659 1037327
1000 473877 378207 648497 1152165

with hint_per_batch set to true :

batch size \ thread num 1 2 4 8
1 196489 210724 264376 383852
10 358335 262496 495732 917832
100 410783 334563 561143 964826
1000 446459 402048 625963 1088148

Copy link
Contributor

@yiwu-arbug yiwu-arbug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test to db_memtable_test and inlineskiplist_test?

assert(hint != nullptr);
Splice* splice = reinterpret_cast<Splice*>(*hint);
if (splice == nullptr) {
splice = AllocateSplice();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllocateSplice will use memtable's arena to allocate space. For write batch hint we should allocate the splice on stack.

// If true, this writebatch will use its own insert hints in concurrent write
//
// Default: false
bool hint_per_batch;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussion we think having an extra hint_per_batch write option is more flexible and better fit our needs.

assert(hint != nullptr);
Splice* splice = reinterpret_cast<Splice*>(*hint);
if (splice == nullptr) {
size_t array_size = sizeof(Node*) * (kMaxHeight_ + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract the logic to a AllocateSpliceOnHeap method?

@siying
Copy link
Contributor

siying commented Sep 5, 2019

@yiwu-arbug is the PR ready?

@yiwu-arbug
Copy link
Contributor

@siying yes, this one is ready for review again.

@siying siying self-assigned this Sep 10, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mostly looks good to me.

@@ -1487,7 +1513,8 @@ class MemTableInserter : public WriteBatch::Handler {
MemTable* mem = cf_mems_->GetMemTable();
bool mem_res =
mem->Add(sequence_, delete_type, key, value,
concurrent_memtable_writes_, get_post_process_info(mem));
concurrent_memtable_writes_, get_post_process_info(mem),
hint_per_batch_ ? &GetHintMap()[mem] : nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm not knowledgable enough to C++. If hint map doesn't have "mem", does the value inserted guarantees to be nullptr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the value will be value-initialized, so pointer will be initialized to nullptr.

// option will be ignored.
//
// Default: false
bool hint_per_batch;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call it memtable_insert_hint_per_batch or something like that. Hint is too general for WriteOptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change it to memtable_insert_hint_per_batch .

@facebook-github-bot
Copy link
Contributor

@Jing118 has updated the pull request. Re-import the pull request

@facebook-github-bot
Copy link
Contributor

@Jing118 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed something last time. One more comment and we are good to go!

@@ -120,6 +120,20 @@ class MemTableRep {
return true;
}

// Same as ::InsertWithHint, but allow concurrnet write
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to document the ownership of hint. In my understanding, the caller will own the object hint, correct?

Or, if that is the case, an even better idea is to change the argument to std::unique_ptr<void>*, so that is ownership is clear.

// Same as ::InsertWithHintConcurrently
// Returns false if MemTableRepFactory::CanHandleDuplicatedKey() is true and
// the <key, seq> already exists.
virtual bool InsertKeyWithHintConcurrently(KeyHandle handle, void** hint) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as here.

@facebook-github-bot
Copy link
Contributor

@Jing118 has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siying has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@siying siying left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed that HISTORY.md is not updated to introduce the feature. I know that because of the time difference the round-trip is long, so I don't want to hold the PR from being committed, but please send out another pull request to update it.

// hint later.
//
// Currently only skip-list based memtable implement the interface. Other
// implementations will fallback to InsertConcurrently() by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to hold the feature from being merged because of this code style comment, but I do hope you go ahead and try whether std::unique_ptr<void>* works in this case, so that the contract of the function is much cleaner.

@yiwu-arbug
Copy link
Contributor

yiwu-arbug commented Sep 12, 2019

@siying you can probably update HISTORY.md of this PR from the link:
https://github.com/Jing118/rocksdb/edit/writebatch-hint3/HISTORY.md?pr=%2Ffacebook%2Frocksdb%2Fpull%2F5728

I think @Jing118 can look into using unique_ptr before merging.

@siying
Copy link
Contributor

siying commented Sep 12, 2019

@yiwu-arbug it will be better if the authors update HISTORY.md. They will describe it in a more authentic way, and the code blaming is slightly easier.

@yiwu-arbug
Copy link
Contributor

yiwu-arbug commented Sep 12, 2019

@yiwu-arbug it will be better if the authors update HISTORY.md. They will describe it in a more authentic way, and the code blaming is slightly easier.

The above link will make change to this PR directly. I mean, in some case reviewer can make changes to the PR directly to avoid additional round trip. For this PR though, let's wait to see if @Jing118 can address the remaining comment.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 1a928c2.

merryChris pushed a commit to merryChris/rocksdb that referenced this pull request Nov 18, 2019
Summary:
Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it.

Bench result (qps):

`./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4`

master:

| batch size \ thread num | 1       | 2       | 4       | 8       |
| ----------------------- | ------- | ------- | ------- | ------- |
| 1                       | 387883  | 220790  | 308294  | 490998  |
| 10                      | 1397208 | 978911  | 1275684 | 1733395 |
| 100                     | 2045414 | 1589927 | 1798782 | 2681039 |
| 1000                    | 2228038 | 1698252 | 1839877 | 2863490 |

fillseq with writebatch hint:

| batch size \ thread num | 1       | 2       | 4       | 8       |
| ----------------------- | ------- | ------- | ------- | ------- |
| 1                       | 286005  | 223570  | 300024  | 466981  |
| 10                      | 970374  | 813308  | 1399299 | 1753588 |
| 100                     | 1962768 | 1983023 | 2676577 | 3086426 |
| 1000                    | 2195853 | 2676782 | 3231048 | 3638143 |
Pull Request resolved: facebook#5728

Differential Revision: D17297240

fbshipit-source-id: b053590a6d77871f1ef2f911a7bd013b3899b26c
tabokie pushed a commit to tabokie/rocksdb that referenced this pull request Dec 8, 2021
Summary:
Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it.

Bench result (qps):

`./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4`

master:

| batch size \ thread num | 1       | 2       | 4       | 8       |
| ----------------------- | ------- | ------- | ------- | ------- |
| 1                       | 387883  | 220790  | 308294  | 490998  |
| 10                      | 1397208 | 978911  | 1275684 | 1733395 |
| 100                     | 2045414 | 1589927 | 1798782 | 2681039 |
| 1000                    | 2228038 | 1698252 | 1839877 | 2863490 |

fillseq with writebatch hint:

| batch size \ thread num | 1       | 2       | 4       | 8       |
| ----------------------- | ------- | ------- | ------- | ------- |
| 1                       | 286005  | 223570  | 300024  | 466981  |
| 10                      | 970374  | 813308  | 1399299 | 1753588 |
| 100                     | 1962768 | 1983023 | 2676577 | 3086426 |
| 1000                    | 2195853 | 2676782 | 3231048 | 3638143 |
Pull Request resolved: facebook#5728

Differential Revision: D17297240

fbshipit-source-id: b053590a6d77871f1ef2f911a7bd013b3899b26c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants