perf: avoid small performance regressions #106661

tbg · 2023-07-12T06:53:10Z

Is your feature request related to a problem? Please describe.

We currently rely on changepoint detection and manual inspection on https://roachperf.crdb.dev/ to catch performance regressions. We are usually only able to claw back large regressions, which are bisected to a commit in a labor-intensive process (which we've been automating a bit, but still).

But looking at many graphs (like kv95) we see that we are continually backsliding. It's easy to see why: allocations are expensive, Go makes it hard to avoid accidental allocations, there is no forcing function in CI, et cetera.

Describe the solution you'd like

I'd like some way to avoid this continuous backsliding. My expectation is that the best way to achieve this is by preventing at least some classes of regressions in CI. For example, by comparing op/s on a small set of representative benchmarks against the parent commit, or by controlling for allocations (though regressions can also be introduced by just adding code!)

Describe alternatives you've considered

Additional context

Jira issue: CRDB-29662

blathers-crl · 2023-07-12T06:53:12Z

cc @cockroachdb/test-eng

nvanbenschoten · 2023-07-12T12:21:05Z

I've been thinking through how something like this could work recently as well. One suggestion is that we may get better mileage for fine-grained regression detection from larger microbenchmarks than from roachtests. A good microbenchmark to look at is BenchmarkKV suite in pkg/sql/tests. The suite has SQL and raw KV interface variants. We would prevent a class of regressions by running these benchmarks in CI and asserting on whether any or all of ns/op, B/op, and allocs/op regress.

As Tobi mentioned, we would need a way to signal to CI that a certain regression is expected and permitted for a given PR.

srosenberg · 2023-07-18T03:53:45Z

@herkolategan FYI. If the above checks out, maybe we could surface results of BenchmarkKV more prominently, as well as as fail PRs which exhibit large regressions.

Epic: none Release note: None Previously cleaning benchmark output was only internally exposed in roachprod-microbench, but we now need it as part of a CI job, hence a new utility command "clean" has been added to roachprod-microbench See: cockroachdb#106661

126810: roachprod-microbench: add clean command r=herkolategan a=sambhav-jain-16 Epic: none Release note: None This change adds a new utility command to `roachprod-microbench` as part of fixing #106661. Co-authored-by: Sambhav Jain <[email protected]>

Epic: none Release note: None Previously cleaning benchmark output was only internally exposed in roachprod-microbench, but we now need it as part of a CI job, hence a new utility command "clean" has been added to roachprod-microbench See: cockroachdb#106661

Epic: none Release note: None As part of fixing cockroachdb#106661, we need to run microbenchmarks in a CI job and compare the microbenchmarks with that of the latest master branch commit. For this purpose we want to re-use `roachprod-microbench` compare command. However, as of current implementation of the compare command, it doesn't do threshold comparison and this PR aims to add that capability using a flag. It is a follow up change to cockroachdb#126810 which exposed `roachprod-microbench` capability of cleaning the benchmark output.

Fixes: cockroachdb#106661 Release note: None Epic: https://cockroachlabs.atlassian.net/browse/CRDB-29662 This change intends to add a new CI step which will run `BenchmarkKV` microbenchmark for the current commit and compares the result with that of the base commit's results which is stored in a cloud bucket. This is needed to avoid minor perf regressions so that any regression is caught in the PR itself. The current threshold is set to 5%. There are highly 3 steps 1. Run benchmarks 2. If this is PR branch, download the base commit's bench results from the cloud storage and compare the results. If there's a regression fail here itself 3. If it is a release/master branch, push the result to the cloud storage with the current commit sha. In cases that a regression is expected, the developers can add [PERF_REGRESSION_EXPECTED] in the PR title which skips the step. (only for PR branches) <what was there before: Previously, ...> <why it needed to change: This was inadequate because ...> <what you did about it: To address this, this patch ...>

127228: workflow: add cockroach-microbench-ci step in github actions r=rickystewart,herkolategan a=sambhav-jain-16 Fixes: #106661 Release note: None Epic: https://cockroachlabs.atlassian.net/browse/CRDB-29662 This change intends to add a new CI step which will run `BenchmarkKV` microbenchmark for the current commit and compares the result with that of the base commit's results which is stored in a cloud bucket. This is needed to avoid minor perf regressions so that any regression is caught in the PR itself. The current threshold is set to 5%. There are highly 3 steps 1. Run benchmarks 2. If this is PR branch, download the latest commit of the target branch's bench results from the cloud storage and compare the results. If there's a regression fail here itself 3. If it is a push step (PR merge to master/release), upload the result to the cloud storage with the current commit sha. Currently, this change won't block the PR. However it will still run the microbenchmarks and upload the results during the push step. Co-authored-by: Sambhav Jain <[email protected]>

…d` is set Epic: https://cockroachlabs.atlassian.net/browse/CRDB-40772 Fixes: cockroachdb#106661 Release note: None

This change aims to make `cockroach-microbench-ci` a required step. There is an addition of a label `performance-regression-expected` that can be used to skip the comparison in case a regression is expected Epic: https://cockroachlabs.atlassian.net/browse/CRDB-40772 Fixes: cockroachdb#106661 Release note: None

tbg · 2024-12-03T09:31:28Z

Reopened since we haven't solved this issue yet.

In light of the spirited discussion on #135899 I'd like to propose that we slightly change the scope of the issue away from "avoiding" small perf impressions by blocking merge jobs to providing high-quality signal about perf regressions during the PR workflow for a small (initially tiny) set of core benchmarks.

This should check the following boxes:

sensitivity and specificity: should reliably detect regressions and improvements, avoiding false positives and negatives. (These goals are in tension with each other, but picking highly stable benchmarks and running interleaved for enough iterations helps a lot).
visibility: information should reach the author proactively but should also be available for those revisiting the PR after.
actionability: report contains easy-to-use reproduction instructions as well as artifacts that can obviate need to reproduce in many cases (CPU and alloc profiles for example, including delta profiles for before/after the change and instructions on how to view them).
timely: gathering the information must not delay CI becoming green. It should also operate on PR push CI, not merge CI, as before the merge is the best time to address issues.
notification hygiene: avoid spamming comment threads etc on repeat pushes to a PR, acknowledging that many PRs don't have a performance component. When there is no meaningful change, there should be no ping, just a green job on the CI matrix that the user can check if they'd like to.

I think we can find 1-3 good initial benchmarks, sourced from BenchmarkKV, BenchmarkSysbench, and perhaps we can also look at one that more meaningfully exercises SQL/DistSQL (cc @mgartner you seem fond of BenchmarkEndToEnd; anything else come to mind?). I'd like to not get hung up on the benchmark selection, as that can vary over time and we could entertain all kinds of heuristics that select the set dynamically to avoid the checks becoming too slow. The goal should be that the benchmarks, including build, finish in 10-15m.

So assume we have good coverage for an initial set of important hot code paths with benchmarks that are fairly stable. We can run them benchdiff-style (say ten reps of old and new binary each, prescribed benchtime-count, run interleaved [old new old new. ....]) and do the benchstat-style regression check. Allocation bytes/counts tend to be more stable than CPU time (in my experience), so we can set different thresholds for each.

If we find a regression (or we could do this unconditionally, too - doesn't take much time) we would also invoke the old and new benchmark an additional time and collect memory and CPU profiles. The report would then contain these alongside the invocations to re-run the comparison locally, as well as the right pprof invocations to view the before, the after, and the delta profiles (after minus before). For starters, a zip file with the right files plus a link to a confluence page for pprof commands should do the trick.

I'm a little unsure about notifications. I think a reasonable way to start is that we introduce this check without any notification on the PR; you'll have to click "Details" on the microbench-CI job to see what transpired. Once we've stabilized the check and are confident that it provides good signal, we can move to posting a (helpful) comment on the PR thread should anything have become measurably faster or slower.

Everything past this line is not something I'd consider in the initial scope of this issue.

In principle I would prefer (at some point) a scheme where we don't make multiple such comments but update the existing one. I could even see us inserting this information into the PR message comment itself (but that's likely a great thing to disagree with at least someone on). Adding a label to the PR could be another subtle way to "notify" that there's something of interest.

There are many next steps we can take after this. We can discuss making the checks required (perhaps validating this decision with additional benchmarking if the initial check detects a regression), perhaps re-framing what this means: rather than enforcing that "regressions aren't allowed", we would require the author to acknowledge the regression and provide a justification for why it's okay before proceeding to merge, meaning that we have a paper trail of how the author/reviewers thought about the regression. We can also discuss how regressions should be treated in bors (my hope would be that we just don't need that ever).

mgartner · 2024-12-03T14:28:43Z

cc @mgartner you seem fond of BenchmarkEndToEnd; anything else come to mind?

I've mainly been using BenchmarkEndToEnd/kv* benchmarks to highlight perf gains/losses in common overhead shared by all queries. The queries are very simple, so overhead of checking the plan cache, converting a logical plan to a physical plan, setting up execution infrastructure, etc. stands out—it isn't hidden by complex optimization or execution that could dominate a benchmark.

I've also been using it simply because of my familiarity with it.

I think that BenchmarkSysbench should be similar to BenchmarkEndToEnd and we probably don't need the latter.

rickystewart · 2024-12-03T18:49:45Z

Adding a label to the PR could be another subtle way to "notify" that there's something of interest.

Just a thought, bors can be told to refuse to merge PR's with certain labels. So the job can add a regression-detected label that would block bors merging, but in the event that you've determined the regression is unimportant/acceptable/etc., you can remove the label. It's an extra bit of bookkeeping to add to the "paper trail" concept you're discussing. I like that this is a manual action that you must deliberately take.

dt · 2024-12-09T15:19:50Z

Just noting a false positive here: https://github.com/cockroachdb/cockroach/actions/runs/12237982508/job/34134987461?pr=137014

dt · 2024-12-09T20:22:36Z

Another false-positive: https://github.com/cockroachdb/cockroach/actions/runs/12242397467/job/34149629585?pr=137028

Following the discussions in cockroachdb#135899. This job is being disabled until we are able to figure out how to remove the variance in microbenchmarks cockroachdb#106661 Epic: none Release note: None

Following the discussions in cockroachdb#135899. This job is being removed until we are able to figure out how to remove the variance in microbenchmarks cockroachdb#106661 Epic: none Release note: None

137075: workflow: remove cockroach-microbench-ci r=srosenberg a=sambhav-jain-16 Following the discussions in #135899. This job is being removed until we are able to figure out how to remove the variance in microbenchmarks #106661 Epic: none Release note: None Co-authored-by: Sambhav Jain <[email protected]>

tbg · 2024-12-16T19:48:39Z

Here's an example of the kind of thing our tooling could point out and discourage the merge of: #137542

srosenberg · 2024-12-17T02:01:10Z

Here's an example of the kind of thing our tooling could point out and discourage the merge of: #137542

Indeed, that was a fairly large regression. Confirming that it was reflected in the weekly run of microbenches. The regressed commit just happens to be the latest on master at the time of the run; we'd obviously need to bisect to find the first commit.

tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-testeng TestEng Team labels Jul 12, 2023

sambhav-jain-16 self-assigned this Jul 2, 2024

sambhav-jain-16 mentioned this issue Jul 8, 2024

roachprod-microbench: add clean command #126810

Merged

sambhav-jain-16 mentioned this issue Jul 9, 2024

roachprod-microbench: add threshold flag to compare command #126885

Merged

sambhav-jain-16 mentioned this issue Jul 16, 2024

workflow: add cockroach-microbench-ci step in github actions #127228

Merged

craig bot closed this as completed in 8cde169 Aug 9, 2024

github-project-automation bot added this to Test Engineering Aug 28, 2024

github-project-automation bot moved this to Done in Test Engineering Aug 28, 2024

sambhav-jain-16 mentioned this issue Nov 21, 2024

workflows: make cockroach-microbench-ci as required #135899

Closed

tbg reopened this Dec 3, 2024

tbg added the o-perf-efficiency Related to performance efficiency label Dec 3, 2024

sambhav-jain-16 mentioned this issue Dec 10, 2024

workflow: remove cockroach-microbench-ci #137075

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: avoid small performance regressions #106661

perf: avoid small performance regressions #106661

tbg commented Jul 12, 2023 •

edited by cockroach-jira-scripts

Loading

blathers-crl bot commented Jul 12, 2023

nvanbenschoten commented Jul 12, 2023

srosenberg commented Jul 18, 2023

tbg commented Dec 3, 2024

mgartner commented Dec 3, 2024

rickystewart commented Dec 3, 2024 •

edited

Loading

dt commented Dec 9, 2024

dt commented Dec 9, 2024

tbg commented Dec 16, 2024

srosenberg commented Dec 17, 2024 •

edited

Loading

perf: avoid small performance regressions #106661

perf: avoid small performance regressions #106661

Comments

tbg commented Jul 12, 2023 • edited by cockroach-jira-scripts Loading

blathers-crl bot commented Jul 12, 2023

nvanbenschoten commented Jul 12, 2023

srosenberg commented Jul 18, 2023

tbg commented Dec 3, 2024

mgartner commented Dec 3, 2024

rickystewart commented Dec 3, 2024 • edited Loading

dt commented Dec 9, 2024

dt commented Dec 9, 2024

tbg commented Dec 16, 2024

srosenberg commented Dec 17, 2024 • edited Loading

tbg commented Jul 12, 2023 •

edited by cockroach-jira-scripts

Loading

rickystewart commented Dec 3, 2024 •

edited

Loading

srosenberg commented Dec 17, 2024 •

edited

Loading