Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up PR feedback #2802

Merged
merged 2 commits into from
Oct 7, 2021
Merged

Speed up PR feedback #2802

merged 2 commits into from
Oct 7, 2021

Conversation

epage
Copy link
Member

@epage epage commented Oct 1, 2021

This drops us down to just a handlful of jobs, allowing us full
parallelism (github caps max parallel jobs). This is dependent on us
using bors to run the "ci" before merging into master so we get the full feedback before making master for red.

There is a balance in what to run. We should consider what is most
likely to break for the widest variety of PRs. Contributors that expect
an uncovered case to fail can always specify @bors try

Motivation

  • Mac is similar enough to Linux, we only need to run one of them and
    Linux has more parallel runners on Github.
  • Since we deal with OsStr, test Windows because its different than
    the others.
  • People are most likely to make changes on stable and break support
    for MSRV, so we should verify that
  • Still test on stable to not block feedback if we run into problems
    with dependencies and our MSRV run.
  • On the other hand, beta and nightly are less likely to break on an
    individual PR
  • Remove benchmarks because most changes are not performance sensitive
    and we aren't looking at the results enough to justify a 30 minute run.

Result

  • 1 lint job (44s)
  • 4 test jobs (5-7m with 3-5m compile, 2-3m to test)
  • 1 MSRV job (3.5m)
  • 1 coverage job (12m) - critical path

Fixes #2801

@pksunkara

This comment has been minimized.

@epage

This comment has been minimized.

@epage

This comment has been minimized.

@epage

This comment has been minimized.

Copy link
Member

@pksunkara pksunkara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to separate out the linting and coveraging into another workflow so that we can reduce the duplication and run it on both PR and master?

branches: [master]
types: [opened, reopened, synchronize]
push:
branches: [master, staging, trying]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wont work because the base for the benchmark is master itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So benchmark currently takes ~30 minutes, 14 minutes for master and 14 minutes for PR run. Even the 14 minute run seems like a pretty long time to go, compared to the value we are getting.

I think we should do some combination of

  • Remove the benchmark pipeline for now
  • Re-evaluate our benchmarks if we can make them faster, generally
  • Potentially identify a subset of benchmarks that are important for CI
  • Identify alternative ways of tracking performance than running both master and PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this comment here to keep the conversation in one place

@pksunkara

Why are we deleting this? I consider benchmarking every commit as important as coverage report and linting every commit. We just don't have good third party tools to properly consume this information yet. But I plan to gather the data and use it.

If need be, we can move this to master branch and compare by trying to checkout the previous commit on master. But if we are doing coverage on every PR, we should follow the same here. It's not like a PR is blocked on this or coverage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we deleting this? I consider benchmarking every commit as important as coverage report and linting every commit. We just don't have good third party tools to properly consume this information yet. But I plan to gather the data and use it.

If we aren't using the information yet, then its going to waste. Let's reconsider this if/when we do use this information.

Even if we can, we need to consider the role of the information and how balance cost and benefit. The cost is taking up an executor for 30min any time it runs (for master, its every update for every PR). Most PRs will be performance neutral. We can get most of the value by tracking master's performance to see which commit introduced a slow down.

For me, I'd much rather we track a PRs impact on code bloat. That is a metric a lot of users are complaining about and it is an easy one to negatively impact.

. It's not like a PR is blocked on this or coverage.

It takes up one of the runners towards our cap, so they slow down progress, even if they don't gate changes.

Copy link
Member

@kbknapp kbknapp Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very much agree with this sentiment. The benefit (catching performance regressions) must outweigh the costs (CI time / runner allocation).

I think once we have tooling to utilize benchmarks, and alert or fail CI on statistically significant regressions (which in and of itself might be a hard thing to do...do GH action runners suffer from noisy neighbors, etc.?) then we can (re-)enable them for a specific subset of PRs (release, etc.).

I say this having caught big-ish performance regressions from small commits before, so knowing that I very much care catching those. But even in the linked case, I didn't catch it on CI benchmarking data.

Something I like about only benchmarking on release candidates is that an almost statistically insignificant regression will slip by benchmarks on each commit (because we can't fail a PR for such a small regression that may just be noise), but add enough together and hopefully the release catches the sum of these regressions in a more noticeable fashion. Similar tactics could be used for other optimizations as well like compile time, binary size, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which in and of itself might be a hard thing to do...do GH action runners suffer from noisy neighbors, etc

Switching to Iai for instruction-count benchmarking would be a way to resolve this

https://bheisler.github.io/criterion.rs/book/iai/iai.html

Something I like about only benchmarking on release candidates is that an almost statistically insignificant regression will slip by benchmarks on each commit (because we can't fail a PR for such a small regression that may just be noise), but add enough together and hopefully the release catches the sum of these regressions in a more noticeable fashion. Similar tactics could be used for other optimizations as well like compile time, binary size, etc.

This is a good point that I've learned in the past but forgot over time. You need to look at both item by item but also the bigger picture. Even better when you have a user-focused target that you are comparing to.

fail-fast: false
matrix:
os: ["ubuntu-latest", "windows-latest"]
rust: ["stable"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this should always be 1.54.0 because that is what we should be developing in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered this, just not testing stable at all. However, I remembered when a dependency update went past our MSRV. That doesn't automatically force us to update our MSRV but we aren't prepared with a strategy for when that happens. Keeping the majority of CI on stable means most things will work until we do come up with a workaround.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this comment to keep the conversation in one place

That doesn't automatically force us to update our MSRV but we aren't prepared with a strategy for when that happens

We need to know when that happens. The main reasons I want msrv here are:

1. stable is being tested on master.

2. MSRV is most important thing people would miss when contributing because as you said, they would probably develop on stable and thus getting it work on stable.

Copy link
Member Author

@epage epage Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't automatically force us to update our MSRV but we aren't prepared with a strategy for when that happens

We need to know when that happens

When designing pipelines, one of the things I consider is "is this sending the right message to the right person"? MSRV failures caused by a release of a dependency is unrelated to existing PRs and we should not block progress on a PR because of it

MSRV is most important thing people would miss when contributing

In my experience, 99% of MSRV problems are compilation problems and not runtime problems

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to add, we'll still get the runtime feedback when bors does the full build before merging into master. So we are still covered for corner cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSRV failures caused by a release of a dependency is unrelated to existing PRs and we should not block progress on a PR because of it

Agreed, however they can also happen due to someone submitting a PR that utilizes features not present in the MSRV. It's happened a few times before, but those are all compile errors not runtime errors. Which brings us to the next comment 😄

In my experience, 99% of MSRV problems are compilation problems and not runtime problems

100% agree. I'd be hard pressed to think of a circumstance where the change results in MSRV still compiling but failing a test. I think due to Rust's backwards compatibility guarantees this should be almost a non-issue.

.github/workflows/ci-pr.yml Outdated Show resolved Hide resolved
.github/workflows/ci-pr.yml Outdated Show resolved Hide resolved
.github/workflows/ci-pr.yml Outdated Show resolved Hide resolved
.github/workflows/ci-pr.yml Outdated Show resolved Hide resolved
run: cargo test --all-targets --features "wrap_help yaml regex debug"
- name: No features
run: cargo test --all-targets --no-default-features --features "std cargo"
msrv:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to test stable, we can make this step stable. But I honestly don't feel like we should need another step. Let's just stable to rust matrix in the above step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm waiting on further discussion on this based on resolving the other discussion of focusing on stable vs MSRV

@pksunkara

This comment has been minimized.

@epage epage force-pushed the ci branch 3 times, most recently from 52b58b8 to a06f69e Compare October 2, 2021 00:23
@epage

This comment has been minimized.

@kbknapp

This comment has been minimized.

@pksunkara

This comment has been minimized.

@epage epage force-pushed the ci branch 2 times, most recently from 6075705 to 670eb54 Compare October 2, 2021 23:47
@epage

This comment has been minimized.

@epage

This comment has been minimized.

@kbknapp

This comment has been minimized.

@epage

This comment has been minimized.

@pksunkara

This comment has been minimized.

@epage epage changed the title WIP: Speed up PR feedback Speed up PR feedback Oct 4, 2021
@epage

This comment has been minimized.

@kbknapp

This comment has been minimized.

@epage
Copy link
Member Author

epage commented Oct 4, 2021

Is there still some outstanding TBD items around the MSRV vs stable?

To clarify, my saying this was ready was that all known code changes are resolved and I have responded to all input and am awaiting further input from reviews for moving forward. In contrast, this started as a "WIP" experiment for me to try ideas.

I see we're adding the debug feature in msrv and also testing clap_derive UI tests, but what are the pros/cons to keeping MSRV it's own step vs just including it in the matrix?

  • debug in ci.yml only does a cargo check, so this PR does the equivalent while being able to reuse existing compilation time (rather than doing it from scratch in a dedicated job)
    • With respect to performance, that is the fastest job and that step is only taking 22s
  • UI tests can only run under MSRV because they are coupled to a specific Rust version
    • In the future, I want to make this relationship more explicit by adding a feature for enabling UI tests, rather than a version check. In the past we've silently skipped UI tests because they got out of sync
  • "Including [MSRV] in the matrix" depends on what you mean by it
    • As mentioned in the PR summary, I think doing the majority of our validation on stable will give us the most information in the most circumstances (MSRV can be brittle to test)
    • Running both stable and MSRV on all matrix combinations and all matrix steps adds very little information, like running both Linux and Mac, but makes it more likely for us to hit our runner caps, serializing our builds
    • We could add selective MSRV runs that have selectively enabled steps, like what ci.yml does. This complicates the pipeline, making it harder to tell what happens in what condition, for only some minor de-duplication.

.github/workflows/benchmark.yml Show resolved Hide resolved
uses: actions-rs/cargo@v1
with:
command: clippy
args: --no-default-features --features "std cargo"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need -p clap:3.0.0-beta.4 here? IIRC, clap conflicts with some of the dependencies and cargo couldn't figure out the correct version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched it from running clippy on a single package to running on the workspace defaults.

matrix:
os: [ubuntu-latest, windows-latest]
rust: [stable]
flags: ["--no-default-features --features 'std cargo'", "--features 'wrap_help yaml regex'"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should name this flag as features and call them as I did in ci.yml to maintain consistency.

The naming them of here is important (instead of just providing the whole flag here) because it would improve the naming of the CI statuses and checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to improve the display for ci-pr.yml, my recommendation would be distinct jobs with their own matrices for each case.

I find ci.yml complicated and do not consider consistency with it a prerequiste. Because Github Actions does not have a rich way for us to associate a label with a set of flags, we have 4 mutually exclusive core steps in the test job and people have to map what "test" means under each scenario. As a contributor, I have found the "pretty" job names does not help me in figuring out what is going on (most of the time, its been cut off). As someone modifying the pipeline, it feels brittle and hard to follow.

To be clear, I understand, at least one reason, why ci.yml needs to be so complicated: because of the complicated matrix, there is a lot of logic that would have to be duplicated if we split it out into separate jobs (the excludes and target-specific setup). However, ci-pr.yml does not have nearly as complicated as a matrix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ability of naming the flags all or none definitely helps lower the bar to contribution. Sure, it's not a high priority, but I think if it's possible to keep that aspect in this PR without much problem I'd prefer it as well.

Then we can move towards a better overall system in the future with a more concerted effort. Especially because we definitely will need some kind of system in place if (once) we get to a more tokio-esque modularity with many different flags.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied my middle ground option of distinct matrix jobs; let's see what we think of it.

Longer term, I wonder if we should look at a command runner, like just or make. That way we can pass features in as a parameter and have that dispatch it out, rather than complicating the pipeline.

With that said, some caveats

  • We should ensure people still get Github Actions time reporting and not include sequencing of steps
  • We should make sure the tool echos the command being run so people can easily run it locally if they don't have the command runner locally or to give context to help them understand why it might have failed

.github/workflows/ci-pr.yml Show resolved Hide resolved
epage added 2 commits October 6, 2021 19:16
Using `head_ref`, we are making it so PRs are all in the same group.
When a new PR comes in (not just an update), it then cancels all other
PRs.  Switching to `ref` makes it so each PR is in its own concurrency
group.
This drops us down to just a handlful of jobs, allowing us full
parallelism (github caps max parallel jobs).  This is dependent on us
using bors to run the "ci" before merging into master.

There is a balance in what to run.  We should consider what is most
likely to break for the widest variety of PRs.  Contributors that expect
an uncovered case to fail can always specify `@bors try`

Motivation
- Mac is similar enough to Linux, we only need to run one of them and
  Linux has more parallel runners on Github.
- Since we deal with `OsStr`, test Windows because its different than
  the others.
- People are most likely to make changes on `stable` and break support
  for MSRV, so we should verify that
- Still test on `stable` to not block feedback if we run into problems
  with dependencies and our MSRV run.
- On the other hand, beta and nightly are less likely to break on an
  individual PR
- Remove benchmarks because most changes are not performance sensitive
  and we aren't looking at the results enough to justify a 30 minute run.

Fixes clap-rs#2801
@epage
Copy link
Member Author

epage commented Oct 7, 2021

Since we are agreed on direction and with bors as a safety net, I'm moving forward with merging this PR and we can continue to resolve the conversations post-merge.

@epage epage merged commit 00f7fe5 into clap-rs:master Oct 7, 2021
@epage epage added the M-unreviewed Meta: Request for post-merge review. label Oct 7, 2021
@epage epage deleted the ci branch October 7, 2021 15:02
Copy link
Member

@pksunkara pksunkara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With how unstable the runners and systems with benchmarking are, the only way we can identify performance regressions are relying on statistics and especially trends for which we need every commit to be benchmarked.

I don't understand why you guys are so against that. It is not like it is blocking merging of the PR.

.github/workflows/site.yml Show resolved Hide resolved
pull_request:
branches: [master]
types: [opened, reopened, synchronize]
concurrency:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed?

I understand removing it for staging because it would never happen. But it should stay for trying and master iiuc on how bors operates. We don't want to tie up our runners with non fresh head refs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the docs are unclear, my understanding is

  • pull_request describes which PR will have CI run for it. The specified branch limits it so it only runs when a PR is against the specified branch.
  • push is any commit into one of the specified branches, whether done by bors, manually clicking the merge button, or pushing from your machine

For example, bors documentation suggestions the following for a single-tier CI

on:
  push:
    branches: [main, staging, trying]
  pull_request:
    branches: [main]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you misunderstood. This comment is for the concurrency key.

@@ -0,0 +1,87 @@
name: CI-PR
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model relies on us only merging through bors, what if we create a dummy job that fails for branch protection to block merging?

#2802 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC Bors actually needs us to completely disable branch protection which I just did. I had it on because I wasn't actually using it. But maybe I am wrong. We can experiment later.

This model relies on us only merging through bors, what if we create a dummy job that fails for branch protection to block merging?

Uhh.. That's a lot of reds on all our commits.

#2802 (comment)

Copy link
Member Author

@epage epage Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been thinking more on this

First, from bors docs

You can check this on GitHub in your repository’s Settings tab, in the Branches section. The “master” branch can be protected, and since bors will usually be the only thing that commits directly to master, you can set it to require the “bors” Commit Status to push to master. Do not set the staging/trying branches to protected.

When using protected branches, leave the Require pull request reviews before merging option unmarked, otherwise you’ll start to get a lot of 422 errors. If you want to enforce reviews on your Pull Requests and/or you’re using CODEOWNERS, require these options solely on bors with the respective options: required_approvals and use_codeowners. Also, make sure bors is included in the list allowed to push to the protected branch.

I believe the main protections that impact bors are

  • Require a pull request before merging
  • Restrict who can push to matching branches

The rest impact PRs which bors doesn't use which includes requiring status checks for PRs.

We should be able to add CI-PR and bors to our required status checks. This will help prevent merging without bors and, if a project admin feels they need to merge without bors, will help encourage them to wait on the status check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated our branch protections. So far, its looking like its working as expected.

I've gone ahead and disabled auto-merge support in Github to avoid forgetting and selecting that.

@pksunkara pksunkara removed the M-unreviewed Meta: Request for post-merge review. label Oct 9, 2021
@epage epage mentioned this pull request Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up CI with a two-tier system built on bors
3 participants