-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: limit long time rocksdb iteration operation #500
Conversation
src/server/pegasus_server_impl.cpp
Outdated
@@ -755,11 +784,21 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req | |||
|
|||
std::unique_ptr<rocksdb::Iterator> it; | |||
bool complete = false; | |||
uint64_t iteration_time = dsn_now_ns(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iteration_start_time.
iteration_time confuses with the time duration rather than the timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iteration_time is duration time, not iteration_start_time, it will be updated when checking time exceed limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sortkey_count, we only add time threshold, and if time exceed limit, server will return -1 to client.
sortkey_count
doesn't consume less resource than multiget or scan during iterations. Why it has less restriction?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
src/server/pegasus_server_impl.cpp
Outdated
@@ -675,12 +697,22 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req | |||
return; | |||
} | |||
|
|||
int32_t max_kv_count = request.max_kv_count > 0 ? request.max_kv_count : INT_MAX; | |||
uint32_t max_kv_count = request.max_kv_count > 0 ? request.max_kv_count : INT_MAX; | |||
uint32_t max_iteration_count = std::min(max_kv_count, _rocksdb_max_iteration_count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the PR, we should update client doc, the returned kv count/size maybe less than user requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will update related document~
@@ -869,7 +916,7 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req | |||
if (r == 1) { | |||
count++; | |||
auto &kv = reverse_kvs.back(); | |||
size += kv.key.length() + kv.value.length(); | |||
limiter->add_size(kv.key.length() + kv.value.length()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why size-limitation doesn't take expired rows into account?
I suggest whichever r
is, limiter should add size of the row.
limiter->add_count();
limiter->add_size(it->key().length() + it->value().length())
if (!limiter->time_check()) {
break;
}
Then add_count
& add_size
can merge into one function, like:
void add_row() {
add_size();
add_count();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current implementation, we calculate size by blob
structure, not Slice
structure provided by iterator
. As a result, if we would like to add size, we can not use it->key()
and it->value
. You can see function append_key_value_for_multi_get
, in this function, if value is expired, return directly, will not parse sortkey and value from Slice
to blob
. I think it is not necessary to parse expired value for calculate exact iteration_size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if value is expired, return directly, will not parse sortkey and value from Slice to blob.
But reading such a row still costs at least disk resources. What if a scan reads only 100 rows but with 100000 expired rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discuss it offline, expired will also add iteration_count.
src/server/pegasus_server_impl.cpp
Outdated
@@ -971,7 +1027,7 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req | |||
// extract value | |||
if (status.ok()) { | |||
// check if exceed limit | |||
if (count >= max_kv_count || size >= max_kv_size) { | |||
if (iteration_count > max_iteration_count || size > max_iteration_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current implementation, if you print an abnormal log because of the size of rocksdb::MultiGet
, the log may show as that you don't expect:
dwarn_replica(
"rocksdb abnormal multi_get from {}: hash_key = {}, "
"start_sort_key = {} ({}), stop_sort_key = {} ({}), "
"sort_key_filter_type = {}, sort_key_filter_pattern = {}, "
"max_kv_count = {}, max_kv_size = {}, reverse = {}, "
"result_count = {}, result_size = {}, iterate_count = {}, "
As you can see, the abnormal log will print start_sort_key="", stop_sort_key=""
. It doesn't distinguish whether the request is the range-read version of MultiGet, or the point-read version (ie. !request.sortkeys.empty())
).
So I suggest keeping it as was, don't print abnormal log for point-read MultiGet. If you want to, you can provide a dedicated abnormal log for point-read:
dwarn_replica(
"rocksdb abnormal point-read multi_get from {}: hash_key = {}, "
"max_kv_count = {}, max_kv_size = {}, "
"result_count = {}, result_size = {}, iterate_count = {}, "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, but I just don't know what is your suggestion. Firstly, abnormal log will not always print start_sort_key="", stop_sort_key=""
, client is allowed to set start and stop key. Secondly, we don't distinguish range read and point-read in old implementation, this abnormal log is not printed if exceeding limit, but multiget slow query log print.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discuss it offline. I update code, don't update iteration_count
if sortkey array is not empty.
src/server/pegasus_server_impl.cpp
Outdated
@@ -856,7 +900,10 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req | |||
} | |||
} | |||
|
|||
iterate_count++; | |||
limiter->add_count(); | |||
if (!limiter->time_check()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can include time_check()
in valid()
.
I guess why you separate the two calls is that time_check
depends on add_count
. Because in the first iteration, the condition always meets _iteration_count % _module_num == 0
, so it checks time duration futilely.
If you want every "module_num" times, the dsn_now
actually calls, it only needs a slight change:
bool time_check()
{
// _iteration_count % _module_num ==> (_iteration_count+1) % _module_num
if (_max_duration_time > 0 && (_iteration_count+1) % _module_num == 0 &&
dsn_now_ns() - _iteration_start_time_ns > _max_duration_time) {
_exceed_limit = true;
_iteration_duration_time_ns = dsn_now_ns() - _iteration_start_time_ns;
return false;
}
return true;
}
So that only after (module_num-1) will there be the first call of dsn_now.
bool valid()
{
if (_iteration_count >= _max_count) {
return false;
}
if (_max_size > 0 && _iteration_size >= _max_size) {
return false;
}
return time_check();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your conjecture is correct and just is part of the reason that I used to separate adding iteration count.
Iteration_count will not always be auto-increment during iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstand your suggestion before, the code is updated.
…sponse during ingestion (#500)
What problem does this PR solve?
As issue #486 shows, this pull request aims to limit long time rocksdb iteration.
In multiget, scan operator, pegasus might scan rocksdb by
rocksdb::Iterator
, the iteration count for is defined by the client option which is totally controlled by user.For multiget, client can define
max_kv_count
andmax_kv_size
in option, if count or size exceed limit, server will stop iteration and return kv-pairs have already been scanned and error kIncomplete to client. If client setmax_kv_count
ormax_kv_size
as -1, it means that server won't limit count or size, which will be dangerous. Besides,max_kv_count
doesn't include expire data count. As result, we update iteration check for multiget below:max_kv_count
tomax_iteration_count
, count expire_count and filter_count.max_iteration_count
= min(max_kv_count
[from client],rocksdb_iteration_count
[from config])max_iteration_data_size
= min(max_kv_size
[from client],multi_get_iteration_size
[from config])For scan, client can define
batch_size
in option, server will stop iteration if count exceed and return kv-pairs to client, and this count is also not included expire_count. We add iteration check for scan below:batch_size
tomax_iteration_count
, count expire_count and filter_count.max_iteration_count
= min(batch_size
[from client],rocksdb_iteration_count
[from config])For sortkey_count, we only add time threshold, and if time exceed limit, server will return -1 to client.
Config update
Check List
Tests