Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: avoid small performance regressions #106661

Open
tbg opened this issue Jul 12, 2023 · 10 comments · Fixed by #127228
Open

perf: avoid small performance regressions #106661

tbg opened this issue Jul 12, 2023 · 10 comments · Fixed by #127228
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) o-perf-efficiency Related to performance efficiency T-testeng TestEng Team

Comments

@tbg
Copy link
Member

tbg commented Jul 12, 2023

Is your feature request related to a problem? Please describe.

We currently rely on changepoint detection and manual inspection on https://roachperf.crdb.dev/ to catch performance regressions. We are usually only able to claw back large regressions, which are bisected to a commit in a labor-intensive process (which we've been automating a bit, but still).

But looking at many graphs (like kv95) we see that we are continually backsliding. It's easy to see why: allocations are expensive, Go makes it hard to avoid accidental allocations, there is no forcing function in CI, et cetera.

Describe the solution you'd like

I'd like some way to avoid this continuous backsliding. My expectation is that the best way to achieve this is by preventing at least some classes of regressions in CI. For example, by comparing op/s on a small set of representative benchmarks against the parent commit, or by controlling for allocations (though regressions can also be introduced by just adding code!)

Describe alternatives you've considered

Additional context

Jira issue: CRDB-29662

@tbg tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-testeng TestEng Team labels Jul 12, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jul 12, 2023

cc @cockroachdb/test-eng

@nvanbenschoten
Copy link
Member

I've been thinking through how something like this could work recently as well. One suggestion is that we may get better mileage for fine-grained regression detection from larger microbenchmarks than from roachtests. A good microbenchmark to look at is BenchmarkKV suite in pkg/sql/tests. The suite has SQL and raw KV interface variants. We would prevent a class of regressions by running these benchmarks in CI and asserting on whether any or all of ns/op, B/op, and allocs/op regress.

As Tobi mentioned, we would need a way to signal to CI that a certain regression is expected and permitted for a given PR.

@srosenberg
Copy link
Member

@herkolategan FYI. If the above checks out, maybe we could surface results of BenchmarkKV more prominently, as well as as fail PRs which exhibit large regressions.

@sambhav-jain-16 sambhav-jain-16 self-assigned this Jul 2, 2024
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Jul 8, 2024
Epic: none

Release note: None

Previously cleaning benchmark output was only internally exposed in roachprod-microbench, but we now need it as part of a CI job, hence a new utility command "clean" has been added to roachprod-microbench
See: cockroachdb#106661
craig bot pushed a commit that referenced this issue Jul 8, 2024
126810: roachprod-microbench: add clean command r=herkolategan a=sambhav-jain-16

Epic: none

Release note: None

This change adds a new utility command to `roachprod-microbench` as part of fixing #106661.

Co-authored-by: Sambhav Jain <[email protected]>
asg0451 pushed a commit to asg0451/cockroach that referenced this issue Jul 10, 2024
Epic: none

Release note: None

Previously cleaning benchmark output was only internally exposed in roachprod-microbench, but we now need it as part of a CI job, hence a new utility command "clean" has been added to roachprod-microbench
See: cockroachdb#106661
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Jul 12, 2024
Epic: none

Release note: None

As part of fixing cockroachdb#106661, we need to run microbenchmarks in a CI job and compare the microbenchmarks with that of the latest master branch commit.

For this purpose we want to re-use `roachprod-microbench` compare command. However, as of current implementation of the compare command, it doesn't do threshold comparison and this PR aims to add that capability using a flag.

It is a follow up change to cockroachdb#126810 which exposed `roachprod-microbench` capability of cleaning the benchmark output.
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Jul 16, 2024
Fixes: cockroachdb#106661
Release note: None
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-29662

This change intends to add a new CI step which will run `BenchmarkKV` microbenchmark for the current commit and compares the result with that of the base commit's results which is stored in a cloud bucket.

This is needed to avoid minor perf regressions so that any regression is caught in the PR itself. The current threshold is set to 5%.

There are highly 3 steps
1. Run benchmarks
2. If this is PR branch, download the base commit's bench results from the cloud storage and compare the results. If there's a regression fail here itself
3. If it is a release/master branch, push the result to the cloud storage with the current commit sha.

In cases that a regression is expected, the developers can add [PERF_REGRESSION_EXPECTED] in the PR title which skips the step. (only for PR branches)

<what was there before: Previously, ...>
<why it needed to change: This was inadequate because ...>
<what you did about it: To address this, this patch ...>
craig bot pushed a commit that referenced this issue Aug 9, 2024
127228: workflow: add cockroach-microbench-ci step in github actions r=rickystewart,herkolategan a=sambhav-jain-16

Fixes: #106661
Release note: None
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-29662

This change intends to add a new CI step which will run `BenchmarkKV` microbenchmark for the current commit and compares the result with that of the base commit's results which is stored in a cloud bucket.

This is needed to avoid minor perf regressions so that any regression is caught in the PR itself. The current threshold is set to 5%.

There are highly 3 steps
1. Run benchmarks 
2. If this is PR branch, download the latest commit of the target branch's bench results from the cloud storage and compare the results. If there's a regression fail here itself
3. If it is a push step (PR merge to master/release), upload the result to the cloud storage with the current commit sha.

Currently, this change won't block the PR. However it will still run the microbenchmarks and upload the results during the push step.

Co-authored-by: Sambhav Jain <[email protected]>
@craig craig bot closed this as completed in 8cde169 Aug 9, 2024
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Nov 21, 2024
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Nov 21, 2024
This change aims to make `cockroach-microbench-ci` a required step.
There is an addition of a label `performance-regression-expected` that
can be used to skip the comparison in case a regression is expected

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-40772
Fixes: cockroachdb#106661

Release note: None
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Nov 21, 2024
This change aims to make `cockroach-microbench-ci` a required step.
There is an addition of a label `performance-regression-expected` that
can be used to skip the comparison in case a regression is expected

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-40772
Fixes: cockroachdb#106661

Release note: None
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Nov 22, 2024
This change aims to make `cockroach-microbench-ci` a required step.
There is an addition of a label `performance-regression-expected` that
can be used to skip the comparison in case a regression is expected

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-40772
Fixes: cockroachdb#106661

Release note: None
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Nov 22, 2024
This change aims to make `cockroach-microbench-ci` a required step.
There is an addition of a label `performance-regression-expected` that
can be used to skip the comparison in case a regression is expected

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-40772
Fixes: cockroachdb#106661

Release note: None
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Nov 22, 2024
This change aims to make `cockroach-microbench-ci` a required step.
There is an addition of a label `performance-regression-expected` that
can be used to skip the comparison in case a regression is expected

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-40772
Fixes: cockroachdb#106661

Release note: None
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Nov 22, 2024
This change aims to make `cockroach-microbench-ci` a required step.
There is an addition of a label `performance-regression-expected` that
can be used to skip the comparison in case a regression is expected

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-40772
Fixes: cockroachdb#106661

Release note: None
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Nov 26, 2024
This change aims to make `cockroach-microbench-ci` a required step.
There is an addition of a label `performance-regression-expected` that
can be used to skip the comparison in case a regression is expected

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-40772
Fixes: cockroachdb#106661

Release note: None
@tbg tbg reopened this Dec 3, 2024
@tbg tbg added the o-perf-efficiency Related to performance efficiency label Dec 3, 2024
@tbg
Copy link
Member Author

tbg commented Dec 3, 2024

Reopened since we haven't solved this issue yet.

In light of the spirited discussion on #135899 I'd like to propose that we slightly change the scope of the issue away from "avoiding" small perf impressions by blocking merge jobs to providing high-quality signal about perf regressions during the PR workflow for a small (initially tiny) set of core benchmarks.

This should check the following boxes:

  • sensitivity and specificity: should reliably detect regressions and improvements, avoiding false positives and negatives. (These goals are in tension with each other, but picking highly stable benchmarks and running interleaved for enough iterations helps a lot).
  • visibility: information should reach the author proactively but should also be available for those revisiting the PR after.
  • actionability: report contains easy-to-use reproduction instructions as well as artifacts that can obviate need to reproduce in many cases (CPU and alloc profiles for example, including delta profiles for before/after the change and instructions on how to view them).
  • timely: gathering the information must not delay CI becoming green. It should also operate on PR push CI, not merge CI, as before the merge is the best time to address issues.
  • notification hygiene: avoid spamming comment threads etc on repeat pushes to a PR, acknowledging that many PRs don't have a performance component. When there is no meaningful change, there should be no ping, just a green job on the CI matrix that the user can check if they'd like to.

I think we can find 1-3 good initial benchmarks, sourced from BenchmarkKV, BenchmarkSysbench, and perhaps we can also look at one that more meaningfully exercises SQL/DistSQL (cc @mgartner you seem fond of BenchmarkEndToEnd; anything else come to mind?). I'd like to not get hung up on the benchmark selection, as that can vary over time and we could entertain all kinds of heuristics that select the set dynamically to avoid the checks becoming too slow. The goal should be that the benchmarks, including build, finish in 10-15m.

So assume we have good coverage for an initial set of important hot code paths with benchmarks that are fairly stable. We can run them benchdiff-style (say ten reps of old and new binary each, prescribed benchtime-count, run interleaved [old new old new. ....]) and do the benchstat-style regression check. Allocation bytes/counts tend to be more stable than CPU time (in my experience), so we can set different thresholds for each.

If we find a regression (or we could do this unconditionally, too - doesn't take much time) we would also invoke the old and new benchmark an additional time and collect memory and CPU profiles. The report would then contain these alongside the invocations to re-run the comparison locally, as well as the right pprof invocations to view the before, the after, and the delta profiles (after minus before). For starters, a zip file with the right files plus a link to a confluence page for pprof commands should do the trick.

I'm a little unsure about notifications. I think a reasonable way to start is that we introduce this check without any notification on the PR; you'll have to click "Details" on the microbench-CI job to see what transpired. Once we've stabilized the check and are confident that it provides good signal, we can move to posting a (helpful) comment on the PR thread should anything have become measurably faster or slower.

Everything past this line is not something I'd consider in the initial scope of this issue.

In principle I would prefer (at some point) a scheme where we don't make multiple such comments but update the existing one. I could even see us inserting this information into the PR message comment itself (but that's likely a great thing to disagree with at least someone on). Adding a label to the PR could be another subtle way to "notify" that there's something of interest.

There are many next steps we can take after this. We can discuss making the checks required (perhaps validating this decision with additional benchmarking if the initial check detects a regression), perhaps re-framing what this means: rather than enforcing that "regressions aren't allowed", we would require the author to acknowledge the regression and provide a justification for why it's okay before proceeding to merge, meaning that we have a paper trail of how the author/reviewers thought about the regression. We can also discuss how regressions should be treated in bors (my hope would be that we just don't need that ever).

@mgartner
Copy link
Collaborator

mgartner commented Dec 3, 2024

cc @mgartner you seem fond of BenchmarkEndToEnd; anything else come to mind?

I've mainly been using BenchmarkEndToEnd/kv* benchmarks to highlight perf gains/losses in common overhead shared by all queries. The queries are very simple, so overhead of checking the plan cache, converting a logical plan to a physical plan, setting up execution infrastructure, etc. stands out—it isn't hidden by complex optimization or execution that could dominate a benchmark.

I've also been using it simply because of my familiarity with it.

I think that BenchmarkSysbench should be similar to BenchmarkEndToEnd and we probably don't need the latter.

@rickystewart
Copy link
Collaborator

rickystewart commented Dec 3, 2024

Adding a label to the PR could be another subtle way to "notify" that there's something of interest.

Just a thought, bors can be told to refuse to merge PR's with certain labels. So the job can add a regression-detected label that would block bors merging, but in the event that you've determined the regression is unimportant/acceptable/etc., you can remove the label. It's an extra bit of bookkeeping to add to the "paper trail" concept you're discussing. I like that this is a manual action that you must deliberately take.

@dt
Copy link
Member

dt commented Dec 9, 2024

@dt
Copy link
Member

dt commented Dec 9, 2024

sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Dec 10, 2024
Following the discussions in
cockroachdb#135899. This job is being
disabled until we are able to figure out how to remove the variance in
microbenchmarks cockroachdb#106661

Epic: none

Release note: None
sambhav-jain-16 added a commit to sambhav-jain-16/cockroach that referenced this issue Dec 10, 2024
Following the discussions in
cockroachdb#135899. This job is being
removed until we are able to figure out how to remove the variance in
microbenchmarks cockroachdb#106661

Epic: none

Release note: None
craig bot pushed a commit that referenced this issue Dec 10, 2024
137075: workflow: remove cockroach-microbench-ci r=srosenberg a=sambhav-jain-16

Following the discussions in
#135899. This job is being removed until we are able to figure out how to remove the variance in microbenchmarks #106661

Epic: none

Release note: None

Co-authored-by: Sambhav Jain <[email protected]>
@tbg
Copy link
Member Author

tbg commented Dec 16, 2024

Here's an example of the kind of thing our tooling could point out and discourage the merge of: #137542

@srosenberg
Copy link
Member

srosenberg commented Dec 17, 2024

Here's an example of the kind of thing our tooling could point out and discourage the merge of: #137542

Indeed, that was a fairly large regression. Confirming that it was reflected in the weekly run of microbenches. The regressed commit just happens to be the latest on master at the time of the run; we'd obviously need to bisect to find the first commit.

weekly_microbenches_ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) o-perf-efficiency Related to performance efficiency T-testeng TestEng Team
Projects
No open projects
Status: Done
7 participants