New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat: limit long time rocksdb iteration operation #500

Merged

acelyc111 merged 24 commits into apache:master from hycdong:abnormal_iteration

Apr 8, 2020

Contributor

hycdong commented Mar 20, 2020 •

edited by neverchanje

Loading

What problem does this PR solve?

As issue #486 shows, this pull request aims to limit long time rocksdb iteration.

In multiget, scan operator, pegasus might scan rocksdb by rocksdb::Iterator, the iteration count for is defined by the client option which is totally controlled by user.

For multiget, client can define max_kv_count and max_kv_size in option, if count or size exceed limit, server will stop iteration and return kv-pairs have already been scanned and error kIncomplete to client. If client set max_kv_count or max_kv_size as -1, it means that server won't limit count or size, which will be dangerous. Besides, max_kv_count doesn't include expire data count. As result, we update iteration check for multiget below:

update max_kv_count to max_iteration_count, count expire_count and filter_count.
max_iteration_count = min(max_kv_count[from client], rocksdb_iteration_count[from config])
max_iteration_data_size = min(max_kv_size[from client], multi_get_iteration_size[from config])
add time threshold for iteration, if time exceed limit, server will also stop iteration and return kv-pairs have already been scanned and error kIncomplete to client. If time threshold = 0 means no time limit. Besides, this option is defined in config file, but can also re-config from app_env.

For scan, client can define batch_size in option, server will stop iteration if count exceed and return kv-pairs to client, and this count is also not included expire_count. We add iteration check for scan below:

update batch_size to max_iteration_count, count expire_count and filter_count.
max_iteration_count = min(batch_size[from client], rocksdb_iteration_count[from config])
add time threshold for iteration, if time exceed limit, server will also stop iteration and return kv-pairs have already been scanned and error kIncomplete to client. Time threshold meaning is as same as mulitget operator.

For sortkey_count, we only add time threshold, and if time exceed limit, server will return -1 to client.

Config update

[pegasus.server]
# cluster level restriction {3000, 30MB, 1000, 30s}
rocksdb_multi_get_max_iteration_count = 3000
rocksdb_multi_get_max_iteration_size = 31457280
rocksdb_max_iteration_count = 1000
rocksdb_iteration_threshold_time_ms = 30000

Check List

Tests

Unit test
Integration test
Manual test

hycdong added 5 commits

March 17, 2020 18:20


          abnormal iteration

839b5ea


          small fix

b33b758


          update rdsn

5c2c68d


          rename config option

cfae76e


          rename

b59e6ef

neverchanje reviewed

View reviewed changes

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

src/base/pegasus_const.cpp Outdated Show resolved Hide resolved

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

src/server/pegasus_server_impl.cpp Outdated

                       std::unique_ptr<rocksdb::Iterator> it;
                       bool complete = false;
+                      uint64_t iteration_time = dsn_now_ns();

Contributor

neverchanje Mar 23, 2020

iteration_start_time.

iteration_time confuses with the time duration rather than the timestamp.

Contributor Author

hycdong Mar 24, 2020

iteration_time is duration time, not iteration_start_time, it will be updated when checking time exceed limit.

acelyc111 reviewed

View reviewed changes

src/server/config.ini Outdated Show resolved Hide resolved

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

hycdong mentioned this pull request

feat: handle scan iteration exceed limit and refactor code XiaoMi/pegasus-java-client#96

Merged

neverchanje reviewed

View reviewed changes

Contributor

neverchanje left a comment •

edited

Loading

For sortkey_count, we only add time threshold, and if time exceed limit, server will return -1 to client.

sortkey_count doesn't consume less resource than multiget or scan during iterations. Why it has less restriction?

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view


          Merge branch 'pegasus' into abnormal_iteration

f654857

This comment has been minimized.

Sign in to view

This comment has been minimized.

Sign in to view

hycdong and others added 3 commits

March 24, 2020 15:52


          fix according to review

e2e34cb


          update iteration time check

5ab712a


          Merge branch 'master' into abnormal_iteration

8c01688

acelyc111 reviewed

View reviewed changes

src/server/pegasus_server_impl.cpp Outdated

                       return;
                   }
-                  int32_t max_kv_count = request.max_kv_count > 0 ? request.max_kv_count : INT_MAX;
+                  uint32_t max_kv_count = request.max_kv_count > 0 ? request.max_kv_count : INT_MAX;
+                  uint32_t max_iteration_count = std::min(max_kv_count, _rocksdb_max_iteration_count);

Member

acelyc111 Mar 25, 2020

After the PR, we should update client doc, the returned kv count/size maybe less than user requested.

Contributor Author

hycdong Mar 25, 2020

Sure, I will update related document~

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

levy5307 reviewed

View reviewed changes

src/server/config.ini Outdated Show resolved Hide resolved

hycdong and others added 5 commits

March 25, 2020 17:30


          fix by code review and update rdsn

340b923


          Merge branch 'master' into abnormal_iteration

9dba899


          Merge branch 'pegasus' into abnormal_iteration

1d06153


          update submodule

fdef7de


          update submodule

78f6922

acelyc111 previously approved these changes

View reviewed changes

neverchanje reviewed

View reviewed changes

src/server/config.ini Show resolved Hide resolved

neverchanje reviewed

View reviewed changes

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved


          fix by review

432d245

hycdong dismissed acelyc111’s stale review via

432d245

March 27, 2020 06:07


          rename

1a0c8cf

acelyc111 previously approved these changes

View reviewed changes

neverchanje added the type/config-change label

Wu Tao and others added 2 commits

April 2, 2020 14:00


          Merge branch 'master' into abnormal_iteration

102bdb4


          Merge branch 'master' into abnormal_iteration

fc0e028

neverchanje reviewed

View reviewed changes

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

src/server/pegasus_server_impl.cpp

    
            @@ -869,7 +916,7 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req
          
                              if (r == 1) {

                                  count++;

                                  auto &kv = reverse_kvs.back();

                                  size += kv.key.length() + kv.value.length();

                                  limiter->add_size(kv.key.length() + kv.value.length());

Contributor

neverchanje Apr 2, 2020

So why size-limitation doesn't take expired rows into account?
I suggest whichever r is, limiter should add size of the row.

                limiter->add_count();
                limiter->add_size(it->key().length() + it->value().length())
                if (!limiter->time_check()) {
                    break;
                }

Then add_count & add_size can merge into one function, like:

void add_row() {
   add_size();
   add_count();
}

Contributor Author

hycdong Apr 3, 2020

In current implementation, we calculate size by blob structure, not Slice structure provided by iterator. As a result, if we would like to add size, we can not use it->key() and it->value. You can see function append_key_value_for_multi_get, in this function, if value is expired, return directly, will not parse sortkey and value from Slice to blob. I think it is not necessary to parse expired value for calculate exact iteration_size.

Contributor

neverchanje Apr 3, 2020

if value is expired, return directly, will not parse sortkey and value from Slice to blob.

But reading such a row still costs at least disk resources. What if a scan reads only 100 rows but with 100000 expired rows?

Contributor Author

hycdong Apr 7, 2020

We discuss it offline, expired will also add iteration_count.

src/server/pegasus_server_impl.cpp Outdated Show resolved Hide resolved

src/server/pegasus_server_impl.cpp Outdated

    
            @@ -971,7 +1027,7 @@ void pegasus_server_impl::on_multi_get(const ::dsn::apps::multi_get_request &req
          
                          // extract value

                          if (status.ok()) {

                              // check if exceed limit

                              if (count >= max_kv_count || size >= max_kv_size) {

                              if (iteration_count > max_iteration_count || size > max_iteration_size) {

Contributor

neverchanje Apr 3, 2020 •

edited

Loading

In the current implementation, if you print an abnormal log because of the size of rocksdb::MultiGet, the log may show as that you don't expect:

        dwarn_replica(
            "rocksdb abnormal multi_get from {}: hash_key = {}, "
            "start_sort_key = {} ({}), stop_sort_key = {} ({}), "
            "sort_key_filter_type = {}, sort_key_filter_pattern = {}, "
            "max_kv_count = {}, max_kv_size = {}, reverse = {}, "
            "result_count = {}, result_size = {}, iterate_count = {}, "

As you can see, the abnormal log will print start_sort_key="", stop_sort_key="". It doesn't distinguish whether the request is the range-read version of MultiGet, or the point-read version (ie. !request.sortkeys.empty())).

So I suggest keeping it as was, don't print abnormal log for point-read MultiGet. If you want to, you can provide a dedicated abnormal log for point-read:

        dwarn_replica(
            "rocksdb abnormal point-read multi_get from {}: hash_key = {}, "
            "max_kv_count = {}, max_kv_size = {}, "
            "result_count = {}, result_size = {}, iterate_count = {}, "

Contributor Author

hycdong Apr 3, 2020

I'm sorry, but I just don't know what is your suggestion. Firstly, abnormal log will not always print start_sort_key="", stop_sort_key="", client is allowed to set start and stop key. Secondly, we don't distinguish range read and point-read in old implementation, this abnormal log is not printed if exceeding limit, but multiget slow query log print.

Contributor Author

hycdong Apr 3, 2020

We discuss it offline. I update code, don't update iteration_count if sortkey array is not empty.

neverchanje reviewed

View reviewed changes

src/server/pegasus_server_impl.cpp Outdated

                                   }
                               }
-                              iterate_count++;
+                              limiter->add_count();
+                              if (!limiter->time_check()) {

Contributor

neverchanje Apr 3, 2020 •

edited

Loading

You can include time_check() in valid().

I guess why you separate the two calls is that time_check depends on add_count. Because in the first iteration, the condition always meets _iteration_count % _module_num == 0, so it checks time duration futilely.

If you want every "module_num" times, the dsn_now actually calls, it only needs a slight change:

    bool time_check()
    {
        // _iteration_count % _module_num ==> (_iteration_count+1) % _module_num
        if (_max_duration_time > 0 &&  (_iteration_count+1) % _module_num == 0 &&
            dsn_now_ns() - _iteration_start_time_ns > _max_duration_time) {
            _exceed_limit = true;
            _iteration_duration_time_ns = dsn_now_ns() - _iteration_start_time_ns;
            return false;
        }
        return true;
    }

So that only after (module_num-1) will there be the first call of dsn_now.

    bool valid()
    {
        if (_iteration_count >= _max_count) {
            return false;
        }
        if (_max_size > 0 && _iteration_size >= _max_size) {
            return false;
        }
        return time_check();
    }

Contributor Author

hycdong Apr 3, 2020

Your conjecture is correct and just is part of the reason that I used to separate adding iteration count.
Iteration_count will not always be auto-increment during iteration.

Contributor Author

hycdong Apr 3, 2020

I misunderstand your suggestion before, the code is updated.

hycdong and others added 2 commits

April 3, 2020 13:42


          Merge branch 'master' into abnormal_iteration

4cc7e7c


          update by review

5b8f852

hycdong dismissed acelyc111’s stale review via

5b8f852

April 3, 2020 07:00

hycdong added 3 commits

April 3, 2020 18:33


          update by code review

7639d53


          update by review

8afdd53

fix

2de9d3b

acelyc111 previously approved these changes

View reviewed changes


          update by review

bdaf2b5

hycdong dismissed acelyc111’s stale review via

bdaf2b5

April 7, 2020 10:43

neverchanje approved these changes

View reviewed changes

acelyc111 approved these changes

View reviewed changes

acelyc111 merged commit b7fe976 into apache:master

neverchanje pushed a commit that referenced this pull request


          feat: limit long time rocksdb iteration operation (#500)

46486fa

neverchanje mentioned this pull request

Release 1.12.3 #506

Closed

neverchanje added the 1.12.3 label

hycdong deleted the abnormal_iteration branch

June 3, 2020 07:00

acelyc111 pushed a commit that referenced this pull request


          feat(bulk-load): bulk load ingestion part6 - meta handle bulk_load_re…

11e1bec

…sponse during ingestion (#500)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1.12.3 type/config-change