Speed up PR feedback #2802

epage · 2021-10-01T20:02:02Z

This drops us down to just a handlful of jobs, allowing us full
parallelism (github caps max parallel jobs). This is dependent on us
using bors to run the "ci" before merging into master so we get the full feedback before making master for red.

There is a balance in what to run. We should consider what is most
likely to break for the widest variety of PRs. Contributors that expect
an uncovered case to fail can always specify @bors try

Motivation

Mac is similar enough to Linux, we only need to run one of them and
Linux has more parallel runners on Github.
Since we deal with OsStr, test Windows because its different than
the others.
People are most likely to make changes on stable and break support
for MSRV, so we should verify that
Still test on stable to not block feedback if we run into problems
with dependencies and our MSRV run.
On the other hand, beta and nightly are less likely to break on an
individual PR
Remove benchmarks because most changes are not performance sensitive
and we aren't looking at the results enough to justify a 30 minute run.

Result

1 lint job (44s)
4 test jobs (5-7m with 3-5m compile, 2-3m to test)
1 MSRV job (3.5m)
1 coverage job (12m) - critical path

Fixes #2801

pksunkara

Do you want to separate out the linting and coveraging into another workflow so that we can reduce the duplication and run it on both PR and master?

pksunkara · 2021-10-01T23:44:32Z

.github/workflows/benchmark.yml

-    branches: [master]
-    types: [opened, reopened, synchronize]
+  push:
+    branches: [master, staging, trying]


This wont work because the base for the benchmark is master itself.

So benchmark currently takes ~30 minutes, 14 minutes for master and 14 minutes for PR run. Even the 14 minute run seems like a pretty long time to go, compared to the value we are getting.

I think we should do some combination of

Remove the benchmark pipeline for now

Re-evaluate our benchmarks if we can make them faster, generally

Potentially identify a subset of benchmarks that are important for CI

Identify alternative ways of tracking performance than running both master and PR

Moving this comment here to keep the conversation in one place

@pksunkara

Why are we deleting this? I consider benchmarking every commit as important as coverage report and linting every commit. We just don't have good third party tools to properly consume this information yet. But I plan to gather the data and use it.

If need be, we can move this to master branch and compare by trying to checkout the previous commit on master. But if we are doing coverage on every PR, we should follow the same here. It's not like a PR is blocked on this or coverage.

Why are we deleting this? I consider benchmarking every commit as important as coverage report and linting every commit. We just don't have good third party tools to properly consume this information yet. But I plan to gather the data and use it.

If we aren't using the information yet, then its going to waste. Let's reconsider this if/when we do use this information.

Even if we can, we need to consider the role of the information and how balance cost and benefit. The cost is taking up an executor for 30min any time it runs (for master, its every update for every PR). Most PRs will be performance neutral. We can get most of the value by tracking master's performance to see which commit introduced a slow down.

For me, I'd much rather we track a PRs impact on code bloat. That is a metric a lot of users are complaining about and it is an easy one to negatively impact.

. It's not like a PR is blocked on this or coverage.

It takes up one of the runners towards our cap, so they slow down progress, even if they don't gate changes.

Very much agree with this sentiment. The benefit (catching performance regressions) must outweigh the costs (CI time / runner allocation).

I think once we have tooling to utilize benchmarks, and alert or fail CI on statistically significant regressions (which in and of itself might be a hard thing to do...do GH action runners suffer from noisy neighbors, etc.?) then we can (re-)enable them for a specific subset of PRs (release, etc.).

I say this having caught big-ish performance regressions from small commits before, so knowing that I very much care catching those. But even in the linked case, I didn't catch it on CI benchmarking data.

Something I like about only benchmarking on release candidates is that an almost statistically insignificant regression will slip by benchmarks on each commit (because we can't fail a PR for such a small regression that may just be noise), but add enough together and hopefully the release catches the sum of these regressions in a more noticeable fashion. Similar tactics could be used for other optimizations as well like compile time, binary size, etc.

which in and of itself might be a hard thing to do...do GH action runners suffer from noisy neighbors, etc

Switching to Iai for instruction-count benchmarking would be a way to resolve this

https://bheisler.github.io/criterion.rs/book/iai/iai.html

Something I like about only benchmarking on release candidates is that an almost statistically insignificant regression will slip by benchmarks on each commit (because we can't fail a PR for such a small regression that may just be noise), but add enough together and hopefully the release catches the sum of these regressions in a more noticeable fashion. Similar tactics could be used for other optimizations as well like compile time, binary size, etc.

This is a good point that I've learned in the past but forgot over time. You need to look at both item by item but also the bigger picture. Even better when you have a user-focused target that you are comparing to.

pksunkara · 2021-10-01T23:46:44Z

.github/workflows/ci-pr.yml

+      fail-fast: false
+      matrix:
+        os: ["ubuntu-latest", "windows-latest"]
+        rust: ["stable"]


IMHO this should always be 1.54.0 because that is what we should be developing in.

I had considered this, just not testing stable at all. However, I remembered when a dependency update went past our MSRV. That doesn't automatically force us to update our MSRV but we aren't prepared with a strategy for when that happens. Keeping the majority of CI on stable means most things will work until we do come up with a workaround.

Moving this comment to keep the conversation in one place

That doesn't automatically force us to update our MSRV but we aren't prepared with a strategy for when that happens

We need to know when that happens. The main reasons I want msrv here are:

1. stable is being tested on master. 2. MSRV is most important thing people would miss when contributing because as you said, they would probably develop on stable and thus getting it work on stable.

That doesn't automatically force us to update our MSRV but we aren't prepared with a strategy for when that happens

We need to know when that happens

When designing pipelines, one of the things I consider is "is this sending the right message to the right person"? MSRV failures caused by a release of a dependency is unrelated to existing PRs and we should not block progress on a PR because of it

MSRV is most important thing people would miss when contributing

In my experience, 99% of MSRV problems are compilation problems and not runtime problems

Forgot to add, we'll still get the runtime feedback when bors does the full build before merging into master. So we are still covered for corner cases.

MSRV failures caused by a release of a dependency is unrelated to existing PRs and we should not block progress on a PR because of it

Agreed, however they can also happen due to someone submitting a PR that utilizes features not present in the MSRV. It's happened a few times before, but those are all compile errors not runtime errors. Which brings us to the next comment 😄

In my experience, 99% of MSRV problems are compilation problems and not runtime problems

100% agree. I'd be hard pressed to think of a circumstance where the change results in MSRV still compiling but failing a test. I think due to Rust's backwards compatibility guarantees this should be almost a non-issue.

.github/workflows/ci-pr.yml

pksunkara · 2021-10-01T23:53:59Z

.github/workflows/ci-pr.yml

+        run: cargo test --all-targets  --features "wrap_help yaml regex debug"
+      - name: No features
+        run: cargo test --all-targets --no-default-features --features "std cargo"
+  msrv:


If we want to test stable, we can make this step stable. But I honestly don't feel like we should need another step. Let's just stable to rust matrix in the above step.

I'm waiting on further discussion on this based on resolving the other discussion of focusing on stable vs MSRV

epage · 2021-10-04T16:04:16Z

Is there still some outstanding TBD items around the MSRV vs stable?

To clarify, my saying this was ready was that all known code changes are resolved and I have responded to all input and am awaiting further input from reviews for moving forward. In contrast, this started as a "WIP" experiment for me to try ideas.

I see we're adding the debug feature in msrv and also testing clap_derive UI tests, but what are the pros/cons to keeping MSRV it's own step vs just including it in the matrix?

debug in ci.yml only does a cargo check, so this PR does the equivalent while being able to reuse existing compilation time (rather than doing it from scratch in a dedicated job)
- With respect to performance, that is the fastest job and that step is only taking 22s
UI tests can only run under MSRV because they are coupled to a specific Rust version
- In the future, I want to make this relationship more explicit by adding a feature for enabling UI tests, rather than a version check. In the past we've silently skipped UI tests because they got out of sync
"Including [MSRV] in the matrix" depends on what you mean by it
- As mentioned in the PR summary, I think doing the majority of our validation on stable will give us the most information in the most circumstances (MSRV can be brittle to test)
- Running both stable and MSRV on all matrix combinations and all matrix steps adds very little information, like running both Linux and Mac, but makes it more likely for us to hit our runner caps, serializing our builds
- We could add selective MSRV runs that have selectively enabled steps, like what ci.yml does. This complicates the pipeline, making it harder to tell what happens in what condition, for only some minor de-duplication.

.github/workflows/benchmark.yml

pksunkara · 2021-10-04T15:58:19Z

.github/workflows/lint.yml

+        uses: actions-rs/cargo@v1
+        with:
+          command: clippy
+          args: --no-default-features --features "std cargo"


Don't you need -p clap:3.0.0-beta.4 here? IIRC, clap conflicts with some of the dependencies and cargo couldn't figure out the correct version.

I switched it from running clippy on a single package to running on the workspace defaults.

pksunkara · 2021-10-04T16:00:04Z

.github/workflows/ci-pr.yml

+      matrix:
+        os: [ubuntu-latest, windows-latest]
+        rust: [stable]
+        flags: ["--no-default-features --features 'std cargo'", "--features 'wrap_help yaml regex'"]


We should name this flag as features and call them as I did in ci.yml to maintain consistency.

The naming them of here is important (instead of just providing the whole flag here) because it would improve the naming of the CI statuses and checks.

If we want to improve the display for ci-pr.yml, my recommendation would be distinct jobs with their own matrices for each case.

I find ci.yml complicated and do not consider consistency with it a prerequiste. Because Github Actions does not have a rich way for us to associate a label with a set of flags, we have 4 mutually exclusive core steps in the test job and people have to map what "test" means under each scenario. As a contributor, I have found the "pretty" job names does not help me in figuring out what is going on (most of the time, its been cut off). As someone modifying the pipeline, it feels brittle and hard to follow.

To be clear, I understand, at least one reason, why ci.yml needs to be so complicated: because of the complicated matrix, there is a lot of logic that would have to be duplicated if we split it out into separate jobs (the excludes and target-specific setup). However, ci-pr.yml does not have nearly as complicated as a matrix.

I think the ability of naming the flags all or none definitely helps lower the bar to contribution. Sure, it's not a high priority, but I think if it's possible to keep that aspect in this PR without much problem I'd prefer it as well.

Then we can move towards a better overall system in the future with a more concerted effort. Especially because we definitely will need some kind of system in place if (once) we get to a more tokio-esque modularity with many different flags.

I've applied my middle ground option of distinct matrix jobs; let's see what we think of it.

Longer term, I wonder if we should look at a command runner, like just or make. That way we can pass features in as a parameter and have that dispatch it out, rather than complicating the pipeline.

With that said, some caveats

We should ensure people still get Github Actions time reporting and not include sequencing of steps

We should make sure the tool echos the command being run so people can easily run it locally if they don't have the command runner locally or to give context to help them understand why it might have failed

.github/workflows/ci-pr.yml

Using `head_ref`, we are making it so PRs are all in the same group. When a new PR comes in (not just an update), it then cancels all other PRs. Switching to `ref` makes it so each PR is in its own concurrency group.

This drops us down to just a handlful of jobs, allowing us full parallelism (github caps max parallel jobs). This is dependent on us using bors to run the "ci" before merging into master. There is a balance in what to run. We should consider what is most likely to break for the widest variety of PRs. Contributors that expect an uncovered case to fail can always specify `@bors try` Motivation - Mac is similar enough to Linux, we only need to run one of them and Linux has more parallel runners on Github. - Since we deal with `OsStr`, test Windows because its different than the others. - People are most likely to make changes on `stable` and break support for MSRV, so we should verify that - Still test on `stable` to not block feedback if we run into problems with dependencies and our MSRV run. - On the other hand, beta and nightly are less likely to break on an individual PR - Remove benchmarks because most changes are not performance sensitive and we aren't looking at the results enough to justify a 30 minute run. Fixes clap-rs#2801

epage · 2021-10-07T15:02:03Z

Since we are agreed on direction and with bors as a safety net, I'm moving forward with merging this PR and we can continue to resolve the conversations post-merge.

pksunkara

With how unstable the runners and systems with benchmarking are, the only way we can identify performance regressions are relying on statistics and especially trends for which we need every commit to be benchmarked.

I don't understand why you guys are so against that. It is not like it is blocking merging of the PR.

.github/workflows/site.yml

pksunkara · 2021-10-07T15:19:00Z

.github/workflows/ci.yml

-  pull_request:
-    branches: [master]
-    types: [opened, reopened, synchronize]
-concurrency:


Why is this removed?

I understand removing it for staging because it would never happen. But it should stay for trying and master iiuc on how bors operates. We don't want to tie up our runners with non fresh head refs.

While the docs are unclear, my understanding is

pull_request describes which PR will have CI run for it. The specified branch limits it so it only runs when a PR is against the specified branch.

push is any commit into one of the specified branches, whether done by bors, manually clicking the merge button, or pushing from your machine

For example, bors documentation suggestions the following for a single-tier CI

on: push: branches: [main, staging, trying] pull_request: branches: [main]

No, you misunderstood. This comment is for the concurrency key.

epage · 2021-10-07T16:46:27Z

.github/workflows/ci-pr.yml

@@ -0,0 +1,87 @@
+name: CI-PR


This model relies on us only merging through bors, what if we create a dummy job that fails for branch protection to block merging?

#2802 (comment)

IIRC Bors actually needs us to completely disable branch protection which I just did. I had it on because I wasn't actually using it. But maybe I am wrong. We can experiment later.

This model relies on us only merging through bors, what if we create a dummy job that fails for branch protection to block merging?

Uhh.. That's a lot of reds on all our commits.

#2802 (comment)

Been thinking more on this

First, from bors docs

You can check this on GitHub in your repository’s Settings tab, in the Branches section. The “master” branch can be protected, and since bors will usually be the only thing that commits directly to master, you can set it to require the “bors” Commit Status to push to master. Do not set the staging/trying branches to protected.

When using protected branches, leave the Require pull request reviews before merging option unmarked, otherwise you’ll start to get a lot of 422 errors. If you want to enforce reviews on your Pull Requests and/or you’re using CODEOWNERS, require these options solely on bors with the respective options: required_approvals and use_codeowners. Also, make sure bors is included in the list allowed to push to the protected branch.

I believe the main protections that impact bors are

Require a pull request before merging

Restrict who can push to matching branches

The rest impact PRs which bors doesn't use which includes requiring status checks for PRs.

We should be able to add CI-PR and bors to our required status checks. This will help prevent merging without bors and, if a project admin feels they need to merge without bors, will help encourage them to wait on the status check.

I've updated our branch protections. So far, its looking like its working as expected.

I've gone ahead and disabled auto-merge support in Github to avoid forgetting and selecting that.

epage force-pushed the ci branch from 24a2d24 to e2b510d Compare October 1, 2021 20:43

epage mentioned this pull request Oct 1, 2021

Speed up CI with a two-tier system built on bors #2801

Closed