Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
86638: admission,kvserver: introduce an elastic cpu limiter r=irfansharif a=irfansharif

Today when admission control admits a request, it is able to run indefinitely consuming arbitrary CPU. For long-running (~1s of CPU work per request) "elastic" (not latency sensitive) work like backups, this can have detrimental effects on foreground latencies – once such work is admitted, it can take up available CPU cores until completion, which prevents foreground work from running. The scheme below aims to change this behavior; there are two components in play:

- A token bucket that hands out slices of CPU time where the total amount handed out is determined by a 'target utilization' – the max % of CPU it's aiming to use (on a 8vCPU machine, if targeting 50% CPU, it can hand out .50 * 8 = 4 seconds of CPU time per second).
- A feedback controller that adjusts the CPU % used by the token bucket periodically by measuring scheduling latency[1]. If over the limit (1ms at p99, chosen experimentally), the % is reduced; if under the limit and we're seeing substantial utilization, the % is increased.

Elastic work acquires CPU tokens representing some predetermined slice of CPU time, blocking until these tokens become available. We found that 100ms of tokens works well enough experimentally. A larger value, say 250ms, would translate to less preemption and fewer RPCs. What's important is that it isn't "too much", like 2s of CPU time, since that would let a single request hog a core potentially for 2s and allow for a large build up of a runnable goroutines (serving foreground traffic) on that core, affecting scheduling/foreground latencies.

The work preempts itself once the slice is used up (as a form of cooperative scheduling). Once preempted, the request returns to the caller with a resumption key. This scheme is effective in clamping down on scheduling latency that's due an excessive amount of elastic work. We have proof from direct trace captures and instrumentation that reducing scheduling latencies directly translates to reduced foreground latencies. They're primarily felt when straddling goroutines, typically around RPC boundaries (request/response handling goroutines); the effects multiplicative for statements that issue multiple requests.

The controller uses fixed deltas for adjustments, adjusting down a bit more aggressively than adjusting up. This is due to the nature of the work being paced — we care more about quickly introducing a ceiling rather than staying near it (though experimentally we’re able to stay near it just fine). It adjusts upwards only when seeing a reasonably high % of utilization with the allotted CPU quota (assuming it’s under the p99 target). The adjustments are small to reduce {over,under}shoot and controller instability at the cost of being somewhat dampened. We use a smoothed form of the p99 latency captures to add stability to the controller input, which consequently affects the controller output. We use a relatively low frequency when sampling scheduler latencies; since the p99 is computed off of histogram data, we saw a lot more jaggedness when taking p99s off of a smaller set of scheduler events (every 50ms for ex.) compared to computing p99s over a larger set of scheduler events (every 2500ms). This, with the small deltas used for adjustments, can make for a dampened response, but assuming a stable-ish foreground CPU load against a node, it works fine. The controller output is limited to a well-defined range that can be tuned through cluster settings.

---

Miscellaneous code details: To evaluate the overhead of checking against ElasticCPUHandle.OverLimit in a tight loop within MVCCExportToSST, we used the following. Underneath the hood the handle does simple estimation of per-iteration running time to avoid calling grunning.Time() frequently; not doing so caused a 5% slowdown in the same benchmark.

      $ dev bench pkg/storage \
        --filter BenchmarkMVCCExportToSST/useElasticCPUHandle --count 10 \
        --timeout 20m -v --stream-output --ignore-cache 2>&1 | tee bench.txt

      $ for flavor in useElasticCPUHandle=true useElasticCPUHandle=false
        do
          grep -E "${flavor}[^0-9]+" bench.txt | sed -E "s/${flavor}+/X/" > $flavor.txt
        done

      # goos: linux
      # goarch: amd64
      # cpu: Intel(R) Xeon(R) CPU @ 2.20GHz

      $ benchstat useElasticCPUHandle\={false,true}.txt
        name                old time/op  new time/op  delta
        MVCCExportToSST/X   2.54s ± 2%   2.53s ± 2%   ~     (p=0.549 n=10+9)

The tests for SchedulerLatencyListener show graphically how the elastic CPU controller behaves in response to various terms in the control loop (delta, multiplicative factor, smoothing constant, etc) -- see snippet below for an example. 

      # With more lag (first half of the graph), we're more likely to
      # observe a large difference between the set-point we need to hit
      # and the utilization we currently have, making for larger
      # scheduling latency fluctuations (i.e. an ineffective controller).
      plot width=70 height=20
      ----
      ----
       1069 ┤    ╭╮                   ╭╮╭╮
       1060 ┤    ││                   ││││
       1052 ┤    ││                   ││││╭╮  ╭╮
       1044 ┤    ││  ╭╮               ││││││  ││
       1035 ┤  ╭╮││  ││    ╭╮   ╭╮    ││││││  ││
       1027 ┤  │││╰╮ ││    ││   ││    ││││││ ╭╯│         ╭╮         ╭╮   ╭╮
       1019 ┤  │││ │ ││  ╭╮││   ││╭╮  ││││││ │ │         ││         ││   ││
       1010 ┤  │││ │ ││  │╰╯│   ││││  ││││││ │ │         ││     ╭──╮│╰─╮ ││  ╭╮╭─
       1002 ┼────────────────────────────────────────────────────────────────────
        993 ┤╰╮│││ │ │││││   ││││  │  │││││╰╮│ ╰╮╭╯││╰─╯╰╯╰╮╭╮ │       ╰─╯╰╯  ╰╯
        985 ┤ ││╰╯ │ │╰╯╰╯   ││││  │  │││││ ││  ╰╯ ╰╯      ╰╯╰─╯
        977 ┤ ││   │ │       ││││  │  │││││ ││
        968 ┤ ││   │╭╯       ││││  ╰╮ │││││ ││
        960 ┤ ││   ││        ││││   │ │││││ ╰╯
        951 ┤ ││   ││        ││││   │ │││││
        943 ┤ ││   ││        ││││   │╭╯││││
        935 ┤ ││   ││        ││╰╯   ││ ╰╯││
        926 ┤ ││   ││        ││     ││   ╰╯
        918 ┤ ╰╯   ╰╯        ││     ╰╯
        910 ┤                ││
        901 ┤                ╰╯
                                 p99 scheduler latencies (μs)

       21.7 ┤                                                            ╭╮
       20.6 ┤                                                     ╭───╮ ╭╯╰───╮
       19.5 ┼─────────╮   ╭──╮  ╭╮          ╭────╮    ╭────╮╭╮╭╮╭─╯   ╰─╯     ╰╮╭
       18.4 ┤         │   │  │╭─╯╰╮  ╭───╮╮ │╭─╮ ╰────│    ╰╯╰╯╰╯              ╰╯
       17.3 ┤         ╰──╭╯╮╭╰────╮╭─╯╯  ││╭╭╯ │     ╭╯
       16.2 ┤            │ ╰╯     ╰╯╭╯   ╰╮╭╯  │     │
       15.2 ┤           ╭╯         ╰╯     ╰╯   │    ╭╯
       14.1 ┤           │                      │    │
       13.0 ┤           │                      │    │
       11.9 ┤          ╭╯                      │   ╭╯
       10.8 ┤          │                       │   │
        9.7 ┤          │                       │  ╭╯
        8.7 ┤         ╭╯                       │  │
        7.6 ┤         │                        │  │
        6.5 ┤        ╭╯                        │ ╭╯
        5.4 ┤        │                         │ │
        4.3 ┤        │                         │╭╯
        3.2 ┤       ╭╯                         ││
        2.2 ┤       │                          ││
        1.1 ┤       │                          ╰╯
        0.0 ┼───────╯
                             elastic cpu utilization and limit (%)

[1]: Specifically the time between a goroutine being ready to run and when it's scheduled to do so by the Go scheduler.

Release note: None
Release justification: Non-production code

87561: backupccl: backup/restore with user-defined functions r=chengxiong-ruan a=chengxiong-ruan

Fixes cockroachdb#84087
With user-defined functions introduced, backup/restore needs to work with the new function descriptors and schema descriptors containing function signatures. This commit adds logic into current backup/restore infrastructure to make sure function descriptors are properly backed up and restored.

Release note (enterprise change): backup/restore now can backup and restore user-defined function descriptors at database and cluster level.
Release justification: necessar fix to backup/restore to make sure it works with user-defined functions.

88010: bazel: add explanatory top-level comment to .bazelrc r=healthy-pod a=rickystewart

People sometimes find themselves getting confused about the different `bazelrc` files. Hopefully this comment will help avoid this.

Release note: None

Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Chengxiong Ruan <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
  • Loading branch information
4 people committed Sep 15, 2022
4 parents ab82555 + 815bf3b + 1e2f6da + 3ccfc6b commit 55c6568
Show file tree
Hide file tree
Showing 69 changed files with 4,994 additions and 229 deletions.
11 changes: 11 additions & 0 deletions .bazelrc
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
# HEY! DON'T edit this file unless you really want to change a configuration for
# everyone building cockroach ever.
# This file is checked into tree and is not auto-generated.
# If you are following directions from `dev doctor`, you probably want to put
# your configurations in ~/.bazelrc or .bazelrc.user instead.
# Configurations in ~/.bazelrc apply to all Bazel builds across all projects on
# your machine. Configurations in .bazelrc.user apply only to builds in this
# workspace. Take a closer look to see which one `dev doctor` is talking about.
# Note that .bazelrc.user should be in your checkout (next to this file), not
# your home directory.

# Define a set up flag aliases, so people can use `--cross` instead of the
# longer `//build/toolchains:cross_flag`.
build --flag_alias=crdb_test=//build/toolchains:crdb_test_flag
Expand Down
10 changes: 10 additions & 0 deletions DEPS.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -3855,6 +3855,16 @@ def go_deps():
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/grpc-ecosystem/grpc-gateway/com_github_grpc_ecosystem_grpc_gateway-v1.16.0.zip",
],
)
go_repository(
name = "com_github_guptarohit_asciigraph",
build_file_proto_mode = "disable_global",
importpath = "github.com/guptarohit/asciigraph",
sha256 = "c2b81da57a50425d313a684efd13d9741c4e9df4c3cca92dea34d562d34271a1",
strip_prefix = "github.com/guptarohit/[email protected]",
urls = [
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/guptarohit/asciigraph/com_github_guptarohit_asciigraph-v0.5.5.zip",
],
)
go_repository(
name = "com_github_hailocab_go_hostpool",
build_file_proto_mode = "disable_global",
Expand Down
1 change: 1 addition & 0 deletions build/bazelutil/distdir_files.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,7 @@ DISTDIR_FILES = {
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/grpc-ecosystem/go-grpc-middleware/com_github_grpc_ecosystem_go_grpc_middleware-v1.2.0.zip": "29612c2745026cd4dcd312797fe62f88c125c89d2914513cbc37efcf0bb4da54",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/grpc-ecosystem/go-grpc-prometheus/com_github_grpc_ecosystem_go_grpc_prometheus-v1.2.0.zip": "124dfc63aa52611a2882417e685c0452d4d99d64c13836a6a6747675e911fc17",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/grpc-ecosystem/grpc-gateway/com_github_grpc_ecosystem_grpc_gateway-v1.16.0.zip": "377b03aef288b34ed894449d3ddba40d525dd7fb55de6e79045cdf499e7fe565",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/guptarohit/asciigraph/com_github_guptarohit_asciigraph-v0.5.5.zip": "c2b81da57a50425d313a684efd13d9741c4e9df4c3cca92dea34d562d34271a1",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/hailocab/go-hostpool/com_github_hailocab_go_hostpool-v0.0.0-20160125115350-e80d13ce29ed.zip": "faf2b985681cda77ab928976b620b790585e364b6aff351483227d474db85e9a",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/hashicorp/consul/api/com_github_hashicorp_consul_api-v1.10.1.zip": "a84081dcb2361b540bb787871abedc0f9569c09637f5b5c40e973500a4402a82",
"https://storage.googleapis.com/cockroach-godeps/gomod/github.com/hashicorp/consul/sdk/com_github_hashicorp_consul_sdk-v0.8.0.zip": "cf29fff6c000ee67eda1b8cacec9648d06944e3cdbb80e2e22dc0165708974c6",
Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ require (
github.com/gorilla/mux v1.8.0
github.com/goware/modvendor v0.5.0
github.com/grpc-ecosystem/grpc-gateway v1.16.0
github.com/guptarohit/asciigraph v0.5.5
github.com/irfansharif/recorder v0.0.0-20211218081646-a21b46510fd6
github.com/jackc/pgconn v1.12.1
github.com/jackc/pgproto3/v2 v2.3.0
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -1196,6 +1196,8 @@ github.com/grpc-ecosystem/grpc-gateway v1.13.0/go.mod h1:8XEsbTttt/W+VvjtQhLACqC
github.com/grpc-ecosystem/grpc-gateway v1.14.4/go.mod h1:6CwZWGDSPRJidgKAtJVvND6soZe6fT7iteq8wDPdhb0=
github.com/grpc-ecosystem/grpc-gateway v1.16.0 h1:gmcG1KaJ57LophUzW0Hy8NmPhnMZb4M0+kPpLofRdBo=
github.com/grpc-ecosystem/grpc-gateway v1.16.0/go.mod h1:BDjrQk3hbvj6Nolgz8mAMFbcEtjT1g+wF4CSlocrBnw=
github.com/guptarohit/asciigraph v0.5.5 h1:ccFnUF8xYIOUPPY3tmdvRyHqmn1MYI9iv1pLKX+/ZkQ=
github.com/guptarohit/asciigraph v0.5.5/go.mod h1:dYl5wwK4gNsnFf9Zp+l06rFiDZ5YtXM6x7SRWZ3KGag=
github.com/hailocab/go-hostpool v0.0.0-20160125115350-e80d13ce29ed/go.mod h1:tMWxXQ9wFIaZeTI9F+hmhFiGpFmhOHzyShyFUhRm0H4=
github.com/hashicorp/consul/api v1.1.0/go.mod h1:VmuI/Lkw1nC05EYQWNKwWGbkg+FbDBtguAZLlVdkD9Q=
github.com/hashicorp/consul/api v1.3.0/go.mod h1:MmDNSzIMUjNpY/mQ398R4bk2FnqQLoPndWW5VkKPlCE=
Expand Down
4 changes: 4 additions & 0 deletions pkg/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -574,6 +574,7 @@ ALL_TESTS = [
"//pkg/util/randutil:randutil_test",
"//pkg/util/retry:retry_test",
"//pkg/util/ring:ring_test",
"//pkg/util/schedulerlatency:schedulerlatency_test",
"//pkg/util/sdnotify:sdnotify_test",
"//pkg/util/search:search_test",
"//pkg/util/shuffle:shuffle_test",
Expand Down Expand Up @@ -1983,6 +1984,8 @@ GO_TARGETS = [
"//pkg/util/retry:retry_test",
"//pkg/util/ring:ring",
"//pkg/util/ring:ring_test",
"//pkg/util/schedulerlatency:schedulerlatency",
"//pkg/util/schedulerlatency:schedulerlatency_test",
"//pkg/util/sdnotify:sdnotify",
"//pkg/util/sdnotify:sdnotify_test",
"//pkg/util/search:search",
Expand Down Expand Up @@ -2938,6 +2941,7 @@ GET_X_DATA_TARGETS = [
"//pkg/util/randutil:get_x_data",
"//pkg/util/retry:get_x_data",
"//pkg/util/ring:get_x_data",
"//pkg/util/schedulerlatency:get_x_data",
"//pkg/util/sdnotify:get_x_data",
"//pkg/util/search:get_x_data",
"//pkg/util/shuffle:get_x_data",
Expand Down
1 change: 1 addition & 0 deletions pkg/ccl/backupccl/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ go_library(
"//pkg/sql/catalog/descidgen",
"//pkg/sql/catalog/descpb",
"//pkg/sql/catalog/descs",
"//pkg/sql/catalog/funcdesc",
"//pkg/sql/catalog/ingesting",
"//pkg/sql/catalog/multiregion",
"//pkg/sql/catalog/nstree",
Expand Down
4 changes: 4 additions & 0 deletions pkg/ccl/backupccl/backup_planning.go
Original file line number Diff line number Diff line change
Expand Up @@ -467,6 +467,10 @@ func checkPrivilegesForBackup(
if err := p.CheckPrivilege(ctx, desc, privilege.USAGE); err != nil {
return errors.WithHint(err, notice)
}
case catalog.FunctionDescriptor:
if err := p.CheckPrivilege(ctx, desc, privilege.EXECUTE); err != nil {
return errors.WithHint(err, notice)
}
}
}

Expand Down
2 changes: 1 addition & 1 deletion pkg/ccl/backupccl/backup_processor.go
Original file line number Diff line number Diff line change
Expand Up @@ -394,7 +394,7 @@ func runBackupProcessor(
// after creating a single SST.
header.TargetBytes = 1
admissionHeader := roachpb.AdmissionHeader{
// Export requests are currently assigned NormalPri.
// Export requests are currently assigned BulkNormalPri.
//
// TODO(dt): Consider linking this to/from the UserPriority field.
Priority: int32(admissionpb.BulkNormalPri),
Expand Down
Loading

0 comments on commit 55c6568

Please sign in to comment.