Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: fine-grained cpu attribution #82625

Closed
3 of 5 tasks
irfansharif opened this issue Jun 8, 2022 · 5 comments
Closed
3 of 5 tasks

*: fine-grained cpu attribution #82625

irfansharif opened this issue Jun 8, 2022 · 5 comments
Labels
A-kv-distribution Relating to rebalancing and leasing. A-kv-observability C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) sync-me-7 sync-me-8 T-kv KV Team

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Jun 8, 2022

This is the tracking issue for #82356. We expect the work here to break down into the following deliverables.

Must have:

  • Write a program to build Go archives for various OS/arch combinations CRDB officially supports/makes use of internally (linux arm64+amd64, darwin arm64+amd64, freebsd amd64, windows amd64). A prototype using hand-rolled scripts (but without CGO support) is here.
    • The official steps, in code form, are are here.
    • We probably want a docker image with the right cross-compilers to run this program under, these archives need to be built with CGO support.
    • Being able to generate these archives in CI as downloadable artifacts would be a nice bonus. Ditto if we're able to run the Go tests alongside.
  • When bumping Go versions (major or minor), updating our steps to use a manually generated archive from above which also applies the patchset proposed in rfcs: fine-grained cpu attribution #82356 (kept in a cockroachdb/go repo, or checked into cockroachdb/cockroach as a patch file).
  • Introduce a library in cockroachdb/cockroach to access per-goroutine running time. This needs to be gated behind a build tag to error/zero out when not built using the patched runtime.

Nice to have:

  • Support a per-store CPU usage breakdown by (i) ranges and (ii) tenants, powered by per-goroutine running time. We'd make this accessible through vtables and include results in debug zips, serving as a CPU-only version of today's hot-ranges report. A prototype of this is available here.
  • Surfacing per-statement cluster-wide CPU usage as part of EXPLAIN ANALYZE. This would likely require integrating the library above into the Stopper and recording as tracing events total running CPU time for a given request (propagated all the way up to the gateway). Tracked in sql: record CPU seconds per statement #87213.

Jira issue: CRDB-16580

@irfansharif irfansharif added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. A-kv-observability labels Jun 8, 2022
@irfansharif irfansharif changed the title *: fine-grained cpu attribution for v22.2 *: fine-grained cpu attribution Jun 8, 2022
craig bot pushed a commit that referenced this issue Jun 8, 2022
82013: storage: add MVCC point synthesizing iterator r=jbowens a=erikgrinaker

This patch adds `pointSynthesizingIter`, an MVCC iterator which wraps an
arbitrary MVCC iterator and synthesizes point keys for range keys at
their start key and where they overlap point keys. It can optionally
synthesize around the SeekGE seek key too, which is useful for point
operations like `MVCCGet` where we may want to return a synthetic
tombstone for an MVCC range tombstone if there is no existing point key.

This will primarily be used to handle MVCC range tombstones in MVCC
scans and gets, as well as during MVCC conflict checks, which allows
much of this logic to remain unchanged and simplified (in particular,
`pebbleMVCCScanner` will not need any changes).

However, this patch does not make use of the iterator yet, since both it
and Pebble will need further performance optimizations for use in hot
paths. For now, correctness is sufficient, and only basic attempts at
performance optimization have been made.

Touches #70412.

Release note: None

82547: opt: fix inverted key column type in testcat r=mgartner a=mgartner

Previously, inverted key columns in the test catalog were incorrectly
given the type of their source column. Now they have the correct type,
`types.EncodedKey`.

Release note: None

82635: bazel: unlist unsupported os+arch combos r=irfansharif a=irfansharif

We don't support these for CRDB builds. Even windows is experimental,
but lets keep that around anyway since we publish binaries for it.

+cc #82625.

Release note: None

Co-authored-by: Erik Grinaker <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
@irfansharif
Copy link
Contributor Author

Connecting some dots. From an internal doc around admission control, which would benefit from:

CPU timers: A measurement of total cpu time used by a goroutine (analogous to a
thread cpu timer), would allow us to compensate tenants who experience high
contention or high IO time, and so spend more time blocked after admission than
others. This is covered in what we are asking for in
https://github.com/golang/go/issues/41554

I was also curious about how the Go runtime is able to pace its GC to use only 30% CPU, and came across https://github.com/golang/proposal/blob/master/design/44167-gc-pacer-redesign.md#a-note-about-cpu-utilization (this whole document is fantastic). This was in the context of trying to understand how we could possibly do similar per-node CPU control for tenants and/or ranges, or get more accurate resource use signals. Turns out Go internally just reads off of nanotime(), using it to compute running time for individual goroutines doing GC assist work, without trying to be accurate about OS threads being descheduled. +cc golang/go#41554 (comment).

https://github.com/irfansharif/go/blob/2eb8b6eec65d3d214c07067db474bbb93de3443a/src/runtime/mgcmark.go#L592-L601

irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 11, 2022
Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:

    package grunning

    // Time returns the time spent by the current goroutine in the
    // running state.
    func Time() time.Duration

The motivating RFC is over at cockroachdb#82356. Informs cockroachdb#82625.

We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (cockroachdb#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:
- FreeBSD (we don't have cross-compilers setup), and
- M1/M2 Macs (we don't have a code-signing pipeline, yet).
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See cockroachdb#84867 for more details.

This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:

    === RUN   TestEquivalentGoroutines
        thread=03 expected≈10.00% got= 9.98% of on-cpu time
        thread=06 expected≈10.00% got=10.00% of on-cpu time
        thread=02 expected≈10.00% got=10.01% of on-cpu time
        thread=10 expected≈10.00% got=10.01% of on-cpu time
        thread=07 expected≈10.00% got= 9.99% of on-cpu time
        thread=04 expected≈10.00% got= 9.99% of on-cpu time
        thread=09 expected≈10.00% got=10.00% of on-cpu time
        thread=01 expected≈10.00% got= 9.99% of on-cpu time
        thread=08 expected≈10.00% got=10.02% of on-cpu time
        thread=05 expected≈10.00% got=10.02% of on-cpu time
    --- PASS: TestEquivalentGoroutines (0.56s)

    === RUN   TestProportionalGoroutines
        thread=01 got  1.82% of on-cpu time: expected≈ 1.00x got=1.00x
        thread=02 got  3.64% of on-cpu time: expected≈ 2.00x got=2.00x
        thread=03 got  5.47% of on-cpu time: expected≈ 3.00x got=3.00x
        thread=04 got  7.28% of on-cpu time: expected≈ 4.00x got=4.00x
        thread=05 got  9.09% of on-cpu time: expected≈ 5.00x got=4.99x
        thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x
        thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x
        thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x
        thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x
        thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x
    --- PASS: TestProportionalGoroutines (1.72s)

    === RUN   TestPingPongHog
        pinger/ponger expected≈1.00x got=0.96x
    --- PASS: TestPingPongHog (0.91s)

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Aug 12, 2022
Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:

    package grunning

    // Time returns the time spent by the current goroutine in the
    // running state.
    func Time() time.Duration

The motivating RFC is over at cockroachdb#82356. Informs cockroachdb#82625.

We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (cockroachdb#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:
- FreeBSD (we don't have cross-compilers setup), and
- M1/M2 Macs (we don't have a code-signing pipeline, yet).
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See cockroachdb#84867 for more details.

This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:

    === RUN   TestEquivalentGoroutines
        thread=03 expected≈10.00% got= 9.98% of on-cpu time
        thread=06 expected≈10.00% got=10.00% of on-cpu time
        thread=02 expected≈10.00% got=10.01% of on-cpu time
        thread=10 expected≈10.00% got=10.01% of on-cpu time
        thread=07 expected≈10.00% got= 9.99% of on-cpu time
        thread=04 expected≈10.00% got= 9.99% of on-cpu time
        thread=09 expected≈10.00% got=10.00% of on-cpu time
        thread=01 expected≈10.00% got= 9.99% of on-cpu time
        thread=08 expected≈10.00% got=10.02% of on-cpu time
        thread=05 expected≈10.00% got=10.02% of on-cpu time
    --- PASS: TestEquivalentGoroutines (0.56s)

    === RUN   TestProportionalGoroutines
        thread=01 got  1.82% of on-cpu time: expected≈ 1.00x got=1.00x
        thread=02 got  3.64% of on-cpu time: expected≈ 2.00x got=2.00x
        thread=03 got  5.47% of on-cpu time: expected≈ 3.00x got=3.00x
        thread=04 got  7.28% of on-cpu time: expected≈ 4.00x got=4.00x
        thread=05 got  9.09% of on-cpu time: expected≈ 5.00x got=4.99x
        thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x
        thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x
        thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x
        thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x
        thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x
    --- PASS: TestProportionalGoroutines (1.72s)

    === RUN   TestPingPongHog
        pinger/ponger expected≈1.00x got=0.96x
    --- PASS: TestPingPongHog (0.91s)

Release note: None
craig bot pushed a commit that referenced this issue Aug 12, 2022
85850: importer: support {} format for arrays in CSV r=ecwall a=rafiss

fixes #84631

Release note (sql change): Arrays can now be imported in a CSV file
using the {} format, similar to COPY FROM. Importing array expressions
(e.g. ARRAY[1, 2, 3]) is still supported as well.

85854: clusterversion,storage: remove 22.1 PebbleFormat version gates r=jbowens a=celiala

This commit removes the following 22.1 version gates:

- PebbleFormatBlockPropertyCollector
- PebbleFormatSplitUserKeysMarked

Cleanup was done following guidance from [21.2 cleanup](#74270 (comment)):

> For the most part, if the gates were just simple if !version.IsActive { return x } or something, I just removed the block, and even if it was a little more complicated, like args = [x]; if version { args = append(args, y) }; foo(args) I still tried to mostly inline it such that it looked natural (i.e. remove that append and make it args = [x, y]).

> However for just a couple more complicated cases that were referring to <21.2 versions that needed to be replaced when those were deleted, I added a placeholder clusterversion.TODOPre21_2 alias for 21.2. Replacing those calls with this alias shouldn't change their behavior -- it was already always true, since the code today should never run in a <21.2 cluster -- but means we can delete those older versions in the meantime and then the owners of these bits can decide how to update them.

Partially addresses #80663

Release note: none

85909: kvserver: instrument RaftTransport workers with pprof labels r=tbg,erikgrinaker a=pavelkalinnikov

The unused arguments in the method signature were used to identify goroutines
in traces. This no longer works after Go 1.17 started passing arguments via
registers.

This commit adds pprof labels when starting these goroutines, to have a cleaner
code, more readable traces, and to work around the new Go convention.

Release note: None

85977: grunning: add library for precise on-CPU time measurement r=irfansharif a=irfansharif

Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:

    package grunning

    // Time returns the time spent by the current goroutine in the
    // running state.
    func Time() time.Duration

The motivating RFC is over at #82356. Informs #82625.
We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:
- FreeBSD (we don't have cross-compilers setup), and
- M1/M2 Macs (we don't have a code-signing pipeline, yet).
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See #84867 for more details.

This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:

    === RUN   TestEquivalentGoroutines
        thread=03 expected≈10.00% got= 9.98% of on-cpu time
        thread=06 expected≈10.00% got=10.00% of on-cpu time
        thread=02 expected≈10.00% got=10.01% of on-cpu time
        thread=10 expected≈10.00% got=10.01% of on-cpu time
        thread=07 expected≈10.00% got= 9.99% of on-cpu time
        thread=04 expected≈10.00% got= 9.99% of on-cpu time
        thread=09 expected≈10.00% got=10.00% of on-cpu time
        thread=01 expected≈10.00% got= 9.99% of on-cpu time
        thread=08 expected≈10.00% got=10.02% of on-cpu time
        thread=05 expected≈10.00% got=10.02% of on-cpu time
    --- PASS: TestEquivalentGoroutines (0.56s)

    === RUN   TestProportionalGoroutines
        thread=01 got  1.82% of on-cpu time: expected≈ 1.00x got=1.00x
        thread=02 got  3.64% of on-cpu time: expected≈ 2.00x got=2.00x
        thread=03 got  5.47% of on-cpu time: expected≈ 3.00x got=3.00x
        thread=04 got  7.28% of on-cpu time: expected≈ 4.00x got=4.00x
        thread=05 got  9.09% of on-cpu time: expected≈ 5.00x got=4.99x
        thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x
        thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x
        thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x
        thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x
        thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x
    --- PASS: TestProportionalGoroutines (1.72s)

    === RUN   TestPingPongHog
        pinger/ponger expected≈1.00x got=0.96x
    --- PASS: TestPingPongHog (0.91s)

Release note: None

85987: sql: fix aggregation of statistics r=maryliag a=maryliag

Previously, because we were using a join, we were double
counting statistics when we had the same fingerprint in
memory and persisted.
This commit adds a `DISTINCT` so we only count them once.

Fixes #85958

Release note: None

86008: sql: logic test to inform maximum builtin function oid change r=chengxiong-ruan a=chengxiong-ruan

This commit adds a logic test to let engineers who added a
new builtin function know that the new builtin function is
constructed earlier than some existing builtin functions at
init time.

Release note: None

Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Celia La <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Chengxiong Ruan <[email protected]>
@irfansharif
Copy link
Contributor Author

The necessary work is done.

@joshimhoff
Copy link
Collaborator

This is super cool. I didn't know we actually implemented this until now. @irfansharif, where do we use this capability today?

@irfansharif
Copy link
Contributor Author

irfansharif commented Oct 26, 2022

Admission control for now, #86638, coming to a 22.2 CC cluster near you. It's pretty sick, check out #89709 (comment) for ex. In 23.1 I hope we start using it in the allocator: #83490.

@joshimhoff
Copy link
Collaborator

So great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. A-kv-observability C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) sync-me-7 sync-me-8 T-kv KV Team
Projects
None yet
Development

No branches or pull requests

4 participants