Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: limit long time rocksdb iteration operation #500

Merged
merged 24 commits into from
Apr 8, 2020

Conversation

hycdong
Copy link
Contributor

@hycdong hycdong commented Mar 20, 2020

What problem does this PR solve?

As issue #486 shows, this pull request aims to limit long time rocksdb iteration.

In multiget, scan operator, pegasus might scan rocksdb by rocksdb::Iterator, the iteration count for is defined by the client option which is totally controlled by user.

For multiget, client can define max_kv_count and max_kv_size in option, if count or size exceed limit, server will stop iteration and return kv-pairs have already been scanned and error kIncomplete to client. If client set max_kv_count or max_kv_size as -1, it means that server won't limit count or size, which will be dangerous. Besides, max_kv_count doesn't include expire data count. As result, we update iteration check for multiget below:

  1. update max_kv_count to max_iteration_count, count expire_count and filter_count.
  2. max_iteration_count = min(max_kv_count[from client], rocksdb_iteration_count[from config])
  3. max_iteration_data_size = min(max_kv_size[from client], multi_get_iteration_size[from config])
  4. add time threshold for iteration, if time exceed limit, server will also stop iteration and return kv-pairs have already been scanned and error kIncomplete to client. If time threshold = 0 means no time limit. Besides, this option is defined in config file, but can also re-config from app_env.

For scan, client can define batch_size in option, server will stop iteration if count exceed and return kv-pairs to client, and this count is also not included expire_count. We add iteration check for scan below:

  1. update batch_size to max_iteration_count, count expire_count and filter_count.
  2. max_iteration_count = min(batch_size[from client], rocksdb_iteration_count[from config])
  3. add time threshold for iteration, if time exceed limit, server will also stop iteration and return kv-pairs have already been scanned and error kIncomplete to client. Time threshold meaning is as same as mulitget operator.

For sortkey_count, we only add time threshold, and if time exceed limit, server will return -1 to client.

Config update

[pegasus.server]
# cluster level restriction {3000, 30MB, 1000, 30s}
rocksdb_multi_get_max_iteration_count = 3000
rocksdb_multi_get_max_iteration_size = 31457280
rocksdb_max_iteration_count = 1000
rocksdb_iteration_threshold_time_ms = 30000

Check List

Tests

  • Unit test
  • Integration test
  • Manual test

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
src/base/pegasus_const.cpp Outdated Show resolved Hide resolved
src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
@@ -755,11 +784,21 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req

std::unique_ptr<rocksdb::Iterator> it;
bool complete = false;
uint64_t iteration_time = dsn_now_ns();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iteration_start_time.

iteration_time confuses with the time duration rather than the timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iteration_time is duration time, not iteration_start_time, it will be updated when checking time exceed limit.

src/server/config.ini Outdated Show resolved Hide resolved
src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@neverchanje neverchanje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sortkey_count, we only add time threshold, and if time exceed limit, server will return -1 to client.

sortkey_count doesn't consume less resource than multiget or scan during iterations. Why it has less restriction?

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
@neverchanje

This comment has been minimized.

@hycdong

This comment has been minimized.

@hycdong

This comment has been minimized.

@@ -675,12 +697,22 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req
return;
}

int32_t max_kv_count = request.max_kv_count > 0 ? request.max_kv_count : INT_MAX;
uint32_t max_kv_count = request.max_kv_count > 0 ? request.max_kv_count : INT_MAX;
uint32_t max_iteration_count = std::min(max_kv_count, _rocksdb_max_iteration_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the PR, we should update client doc, the returned kv count/size maybe less than user requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will update related document~

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
src/server/config.ini Outdated Show resolved Hide resolved
acelyc111
acelyc111 previously approved these changes Mar 26, 2020
acelyc111
acelyc111 previously approved these changes Apr 1, 2020
@neverchanje neverchanje added the type/config-change Added or modified configuration that should be noted on release note of new version. label Apr 2, 2020
src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
@@ -869,7 +916,7 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req
if (r == 1) {
count++;
auto &kv = reverse_kvs.back();
size += kv.key.length() + kv.value.length();
limiter->add_size(kv.key.length() + kv.value.length());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So why size-limitation doesn't take expired rows into account?
I suggest whichever r is, limiter should add size of the row.

                limiter->add_count();
                limiter->add_size(it->key().length() + it->value().length())
                if (!limiter->time_check()) {
                    break;
                }

Then add_count & add_size can merge into one function, like:

void add_row() {
   add_size();
   add_count();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current implementation, we calculate size by blob structure, not Slice structure provided by iterator. As a result, if we would like to add size, we can not use it->key() and it->value. You can see function append_key_value_for_multi_get, in this function, if value is expired, return directly, will not parse sortkey and value from Slice to blob. I think it is not necessary to parse expired value for calculate exact iteration_size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if value is expired, return directly, will not parse sortkey and value from Slice to blob.

But reading such a row still costs at least disk resources. What if a scan reads only 100 rows but with 100000 expired rows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discuss it offline, expired will also add iteration_count.

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved
@@ -971,7 +1027,7 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req
// extract value
if (status.ok()) {
// check if exceed limit
if (count >= max_kv_count || size >= max_kv_size) {
if (iteration_count > max_iteration_count || size > max_iteration_size) {
Copy link
Contributor

@neverchanje neverchanje Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation, if you print an abnormal log because of the size of rocksdb::MultiGet, the log may show as that you don't expect:

        dwarn_replica(
            "rocksdb abnormal multi_get from {}: hash_key = {}, "
            "start_sort_key = {} ({}), stop_sort_key = {} ({}), "
            "sort_key_filter_type = {}, sort_key_filter_pattern = {}, "
            "max_kv_count = {}, max_kv_size = {}, reverse = {}, "
            "result_count = {}, result_size = {}, iterate_count = {}, "

As you can see, the abnormal log will print start_sort_key="", stop_sort_key="". It doesn't distinguish whether the request is the range-read version of MultiGet, or the point-read version (ie. !request.sortkeys.empty())).

So I suggest keeping it as was, don't print abnormal log for point-read MultiGet. If you want to, you can provide a dedicated abnormal log for point-read:

        dwarn_replica(
            "rocksdb abnormal point-read multi_get from {}: hash_key = {}, "
            "max_kv_count = {}, max_kv_size = {}, "
            "result_count = {}, result_size = {}, iterate_count = {}, "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, but I just don't know what is your suggestion. Firstly, abnormal log will not always print start_sort_key="", stop_sort_key="", client is allowed to set start and stop key. Secondly, we don't distinguish range read and point-read in old implementation, this abnormal log is not printed if exceeding limit, but multiget slow query log print.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discuss it offline. I update code, don't update iteration_count if sortkey array is not empty.

@@ -856,7 +900,10 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req
}
}

iterate_count++;
limiter->add_count();
if (!limiter->time_check()) {
Copy link
Contributor

@neverchanje neverchanje Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can include time_check() in valid().

I guess why you separate the two calls is that time_check depends on add_count. Because in the first iteration, the condition always meets _iteration_count % _module_num == 0, so it checks time duration futilely.

If you want every "module_num" times, the dsn_now actually calls, it only needs a slight change:

    bool time_check()
    {
        // _iteration_count % _module_num ==> (_iteration_count+1) % _module_num
        if (_max_duration_time > 0 &&  (_iteration_count+1) % _module_num == 0 &&
            dsn_now_ns() - _iteration_start_time_ns > _max_duration_time) {
            _exceed_limit = true;
            _iteration_duration_time_ns = dsn_now_ns() - _iteration_start_time_ns;
            return false;
        }
        return true;
    }

So that only after (module_num-1) will there be the first call of dsn_now.

    bool valid()
    {
        if (_iteration_count >= _max_count) {
            return false;
        }
        if (_max_size > 0 && _iteration_size >= _max_size) {
            return false;
        }
        return time_check();
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your conjecture is correct and just is part of the reason that I used to separate adding iteration count.
Iteration_count will not always be auto-increment during iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstand your suggestion before, the code is updated.

acelyc111
acelyc111 previously approved these changes Apr 5, 2020
@acelyc111 acelyc111 merged commit b7fe976 into apache:master Apr 8, 2020
@neverchanje neverchanje mentioned this pull request Apr 10, 2020
@hycdong hycdong deleted the abnormal_iteration branch June 3, 2020 07:00
acelyc111 pushed a commit that referenced this pull request Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.12.3 type/config-change Added or modified configuration that should be noted on release note of new version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants