-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up PR feedback #2802
Speed up PR feedback #2802
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to separate out the linting and coveraging into another workflow so that we can reduce the duplication and run it on both PR and master?
.github/workflows/benchmark.yml
Outdated
branches: [master] | ||
types: [opened, reopened, synchronize] | ||
push: | ||
branches: [master, staging, trying] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wont work because the base for the benchmark is master
itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So benchmark currently takes ~30 minutes, 14 minutes for master and 14 minutes for PR run. Even the 14 minute run seems like a pretty long time to go, compared to the value we are getting.
I think we should do some combination of
- Remove the benchmark pipeline for now
- Re-evaluate our benchmarks if we can make them faster, generally
- Potentially identify a subset of benchmarks that are important for CI
- Identify alternative ways of tracking performance than running both master and PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this comment here to keep the conversation in one place
Why are we deleting this? I consider benchmarking every commit as important as coverage report and linting every commit. We just don't have good third party tools to properly consume this information yet. But I plan to gather the data and use it.
If need be, we can move this to master branch and compare by trying to checkout the previous commit on master. But if we are doing coverage on every PR, we should follow the same here. It's not like a PR is blocked on this or coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we deleting this? I consider benchmarking every commit as important as coverage report and linting every commit. We just don't have good third party tools to properly consume this information yet. But I plan to gather the data and use it.
If we aren't using the information yet, then its going to waste. Let's reconsider this if/when we do use this information.
Even if we can, we need to consider the role of the information and how balance cost and benefit. The cost is taking up an executor for 30min any time it runs (for master
, its every update for every PR). Most PRs will be performance neutral. We can get most of the value by tracking master
's performance to see which commit introduced a slow down.
For me, I'd much rather we track a PRs impact on code bloat. That is a metric a lot of users are complaining about and it is an easy one to negatively impact.
. It's not like a PR is blocked on this or coverage.
It takes up one of the runners towards our cap, so they slow down progress, even if they don't gate changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very much agree with this sentiment. The benefit (catching performance regressions) must outweigh the costs (CI time / runner allocation).
I think once we have tooling to utilize benchmarks, and alert or fail CI on statistically significant regressions (which in and of itself might be a hard thing to do...do GH action runners suffer from noisy neighbors, etc.?) then we can (re-)enable them for a specific subset of PRs (release, etc.).
I say this having caught big-ish performance regressions from small commits before, so knowing that I very much care catching those. But even in the linked case, I didn't catch it on CI benchmarking data.
Something I like about only benchmarking on release candidates is that an almost statistically insignificant regression will slip by benchmarks on each commit (because we can't fail a PR for such a small regression that may just be noise), but add enough together and hopefully the release catches the sum of these regressions in a more noticeable fashion. Similar tactics could be used for other optimizations as well like compile time, binary size, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which in and of itself might be a hard thing to do...do GH action runners suffer from noisy neighbors, etc
Switching to Iai for instruction-count benchmarking would be a way to resolve this
https://bheisler.github.io/criterion.rs/book/iai/iai.html
Something I like about only benchmarking on release candidates is that an almost statistically insignificant regression will slip by benchmarks on each commit (because we can't fail a PR for such a small regression that may just be noise), but add enough together and hopefully the release catches the sum of these regressions in a more noticeable fashion. Similar tactics could be used for other optimizations as well like compile time, binary size, etc.
This is a good point that I've learned in the past but forgot over time. You need to look at both item by item but also the bigger picture. Even better when you have a user-focused target that you are comparing to.
.github/workflows/ci-pr.yml
Outdated
fail-fast: false | ||
matrix: | ||
os: ["ubuntu-latest", "windows-latest"] | ||
rust: ["stable"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this should always be 1.54.0
because that is what we should be developing in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had considered this, just not testing stable at all. However, I remembered when a dependency update went past our MSRV. That doesn't automatically force us to update our MSRV but we aren't prepared with a strategy for when that happens. Keeping the majority of CI on stable
means most things will work until we do come up with a workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this comment to keep the conversation in one place
That doesn't automatically force us to update our MSRV but we aren't prepared with a strategy for when that happens
We need to know when that happens. The main reasons I want msrv here are:
1. stable is being tested on master. 2. MSRV is most important thing people would miss when contributing because as you said, they would probably develop on stable and thus getting it work on stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't automatically force us to update our MSRV but we aren't prepared with a strategy for when that happens
We need to know when that happens
When designing pipelines, one of the things I consider is "is this sending the right message to the right person"? MSRV failures caused by a release of a dependency is unrelated to existing PRs and we should not block progress on a PR because of it
MSRV is most important thing people would miss when contributing
In my experience, 99% of MSRV problems are compilation problems and not runtime problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to add, we'll still get the runtime feedback when bors does the full build before merging into master. So we are still covered for corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSRV failures caused by a release of a dependency is unrelated to existing PRs and we should not block progress on a PR because of it
Agreed, however they can also happen due to someone submitting a PR that utilizes features not present in the MSRV. It's happened a few times before, but those are all compile errors not runtime errors. Which brings us to the next comment 😄
In my experience, 99% of MSRV problems are compilation problems and not runtime problems
100% agree. I'd be hard pressed to think of a circumstance where the change results in MSRV still compiling but failing a test. I think due to Rust's backwards compatibility guarantees this should be almost a non-issue.
run: cargo test --all-targets --features "wrap_help yaml regex debug" | ||
- name: No features | ||
run: cargo test --all-targets --no-default-features --features "std cargo" | ||
msrv: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to test stable, we can make this step stable. But I honestly don't feel like we should need another step. Let's just stable
to rust matrix in the above step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm waiting on further discussion on this based on resolving the other discussion of focusing on stable vs MSRV
This comment has been minimized.
This comment has been minimized.
52b58b8
to
a06f69e
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
6075705
to
670eb54
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
To clarify, my saying this was ready was that all known code changes are resolved and I have responded to all input and am awaiting further input from reviews for moving forward. In contrast, this started as a "WIP" experiment for me to try ideas.
|
uses: actions-rs/cargo@v1 | ||
with: | ||
command: clippy | ||
args: --no-default-features --features "std cargo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need -p clap:3.0.0-beta.4
here? IIRC, clap conflicts with some of the dependencies and cargo couldn't figure out the correct version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched it from running clippy on a single package to running on the workspace defaults.
.github/workflows/ci-pr.yml
Outdated
matrix: | ||
os: [ubuntu-latest, windows-latest] | ||
rust: [stable] | ||
flags: ["--no-default-features --features 'std cargo'", "--features 'wrap_help yaml regex'"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should name this flag as features
and call them as I did in ci.yml
to maintain consistency.
The naming them of here is important (instead of just providing the whole flag here) because it would improve the naming of the CI statuses and checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to improve the display for ci-pr.yml
, my recommendation would be distinct jobs with their own matrices for each case.
I find ci.yml
complicated and do not consider consistency with it a prerequiste. Because Github Actions does not have a rich way for us to associate a label with a set of flags, we have 4 mutually exclusive core steps in the test
job and people have to map what "test" means under each scenario. As a contributor, I have found the "pretty" job names does not help me in figuring out what is going on (most of the time, its been cut off). As someone modifying the pipeline, it feels brittle and hard to follow.
To be clear, I understand, at least one reason, why ci.yml
needs to be so complicated: because of the complicated matrix, there is a lot of logic that would have to be duplicated if we split it out into separate jobs (the exclude
s and target-specific setup). However, ci-pr.yml
does not have nearly as complicated as a matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ability of naming the flags all
or none
definitely helps lower the bar to contribution. Sure, it's not a high priority, but I think if it's possible to keep that aspect in this PR without much problem I'd prefer it as well.
Then we can move towards a better overall system in the future with a more concerted effort. Especially because we definitely will need some kind of system in place if (once) we get to a more tokio
-esque modularity with many different flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've applied my middle ground option of distinct matrix jobs; let's see what we think of it.
Longer term, I wonder if we should look at a command runner, like just
or make
. That way we can pass features in as a parameter and have that dispatch it out, rather than complicating the pipeline.
With that said, some caveats
- We should ensure people still get Github Actions time reporting and not include sequencing of steps
- We should make sure the tool echos the command being run so people can easily run it locally if they don't have the command runner locally or to give context to help them understand why it might have failed
Using `head_ref`, we are making it so PRs are all in the same group. When a new PR comes in (not just an update), it then cancels all other PRs. Switching to `ref` makes it so each PR is in its own concurrency group.
This drops us down to just a handlful of jobs, allowing us full parallelism (github caps max parallel jobs). This is dependent on us using bors to run the "ci" before merging into master. There is a balance in what to run. We should consider what is most likely to break for the widest variety of PRs. Contributors that expect an uncovered case to fail can always specify `@bors try` Motivation - Mac is similar enough to Linux, we only need to run one of them and Linux has more parallel runners on Github. - Since we deal with `OsStr`, test Windows because its different than the others. - People are most likely to make changes on `stable` and break support for MSRV, so we should verify that - Still test on `stable` to not block feedback if we run into problems with dependencies and our MSRV run. - On the other hand, beta and nightly are less likely to break on an individual PR - Remove benchmarks because most changes are not performance sensitive and we aren't looking at the results enough to justify a 30 minute run. Fixes clap-rs#2801
Since we are agreed on direction and with bors as a safety net, I'm moving forward with merging this PR and we can continue to resolve the conversations post-merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With how unstable the runners and systems with benchmarking are, the only way we can identify performance regressions are relying on statistics and especially trends for which we need every commit to be benchmarked.
I don't understand why you guys are so against that. It is not like it is blocking merging of the PR.
pull_request: | ||
branches: [master] | ||
types: [opened, reopened, synchronize] | ||
concurrency: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed?
I understand removing it for staging because it would never happen. But it should stay for trying and master iiuc on how bors operates. We don't want to tie up our runners with non fresh head refs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the docs are unclear, my understanding is
pull_request
describes which PR will have CI run for it. The specified branch limits it so it only runs when a PR is against the specified branch.push
is any commit into one of the specified branches, whether done by bors, manually clicking the merge button, or pushing from your machine
For example, bors documentation suggestions the following for a single-tier CI
on:
push:
branches: [main, staging, trying]
pull_request:
branches: [main]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you misunderstood. This comment is for the concurrency key.
@@ -0,0 +1,87 @@ | |||
name: CI-PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model relies on us only merging through bors, what if we create a dummy job that fails for branch protection to block merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC Bors actually needs us to completely disable branch protection which I just did. I had it on because I wasn't actually using it. But maybe I am wrong. We can experiment later.
This model relies on us only merging through bors, what if we create a dummy job that fails for branch protection to block merging?
Uhh.. That's a lot of reds on all our commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been thinking more on this
First, from bors docs
You can check this on GitHub in your repository’s Settings tab, in the Branches section. The “master” branch can be protected, and since bors will usually be the only thing that commits directly to master, you can set it to require the “bors” Commit Status to push to master. Do not set the staging/trying branches to protected.
When using protected branches, leave the Require pull request reviews before merging option unmarked, otherwise you’ll start to get a lot of 422 errors. If you want to enforce reviews on your Pull Requests and/or you’re using CODEOWNERS, require these options solely on bors with the respective options: required_approvals and use_codeowners. Also, make sure bors is included in the list allowed to push to the protected branch.
I believe the main protections that impact bors are
- Require a pull request before merging
- Restrict who can push to matching branches
The rest impact PRs which bors doesn't use which includes requiring status checks for PRs.
We should be able to add CI-PR
and bors
to our required status checks. This will help prevent merging without bors and, if a project admin feels they need to merge without bors, will help encourage them to wait on the status check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated our branch protections. So far, its looking like its working as expected.
I've gone ahead and disabled auto-merge support in Github to avoid forgetting and selecting that.
This drops us down to just a handlful of jobs, allowing us full
parallelism (github caps max parallel jobs). This is dependent on us
using bors to run the "ci" before merging into master so we get the full feedback before making
master
for red.There is a balance in what to run. We should consider what is most
likely to break for the widest variety of PRs. Contributors that expect
an uncovered case to fail can always specify
@bors try
Motivation
Linux has more parallel runners on Github.
OsStr
, test Windows because its different thanthe others.
stable
and break supportfor MSRV, so we should verify that
stable
to not block feedback if we run into problemswith dependencies and our MSRV run.
individual PR
and we aren't looking at the results enough to justify a 30 minute run.
Result
Fixes #2801