Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a resume-ci label to issues? #40817

Open
benjamingr opened this issue Nov 29, 2020 · 15 comments
Open

Add a resume-ci label to issues? #40817

benjamingr opened this issue Nov 29, 2020 · 15 comments
Assignees
Labels
build Issues and PRs related to build files or the CI.

Comments

@benjamingr
Copy link
Member

Hey, not sure if this is the right place to ask (is it?)

There is a request-ci label which saves me the hassle of going to jenkins, entering all the details and starting a build for pull requests.

It would be useful to me if there was a similar label for resume-ci (when most builds pass but one is flakey or failed due to an unrelated infra issue).

Bonus points if it queues the "resume" even if the build didn't finish yet - but either way would save me time.

@github-actions
Copy link
Contributor

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@benjamingr
Copy link
Member Author

This would still be useful

@richardlau
Copy link
Member

I'm going to move this over to the core repo as CI triggering via labels is now done via Actions (https://github.com/nodejs/node/blob/master/.github/workflows/auto-start-ci.yml).

I'm not sure if it's possible to resume a build in Jenkins before it has completed (i.e. any queuing would have to be self-managed).

@richardlau richardlau transferred this issue from nodejs/build Nov 15, 2021
@richardlau richardlau added the build Issues and PRs related to build files or the CI. label Nov 15, 2021
@Trott
Copy link
Member

Trott commented Nov 16, 2021

I can see how this would seem to be useful on the surface, and maybe it really would be, but I wonder if it would encourage resuming CI runs without looking at what actually failed. I could be wrong, but I would be cautious about the way the more we make it so that we don't check actual CI runs and just keep re-running them until they pass, the less useful we make CI.

@Trott
Copy link
Member

Trott commented Nov 16, 2021

I can see how this would seem to be useful on the surface, and maybe it really would be, but I wonder if it would encourage resuming CI runs without looking at what actually failed. I could be wrong, but I would be cautious about the way the more we make it so that we don't check actual CI runs and just keep re-running them until they pass, the less useful we make CI.

Instead of effort put into making such a label work, I would prefer to see a large effort put into making CI more reliable so that resuming runs is the exception rather than the rule.

@RaisinTen
Copy link
Contributor

I wonder if it would encourage resuming CI runs without looking at what actually failed

The functionality already exists in Jenkins, so if someone wants to resume CI without looking at the failures, they would do it regardless of the label. I don't see how just adding the label for convenience would encourage more of that.

Instead of effort put into making such a label work, I would prefer to see a large effort put into making CI more reliable so that resuming runs is the exception rather than the rule.

I think people are already putting in the effort of making CI more reliable - there are currently 42 open issues and 427 resolved ones that have the flaky-test Issues and PRs related to the tests with unstable failures on the CI. label but flaky tests seem to pop up every now and then. Writing more tests increases the chances of introducing flaky ones.

@richardlau
Copy link
Member

If anyone wants to attempt this:

For resuming you will need to add functionality to node-core-utils. You would need to investigate if there is a Jenkins REST API end point for resuming a build.

@Trott
Copy link
Member

Trott commented Mar 3, 2022

The functionality already exists in Jenkins, so if someone wants to resume CI without looking at the failures, they would do it regardless of the label. I don't see how just adding the label for convenience would encourage more of that.

IIUC, the label means that Triagers can do it too, so it expands the pool of people that have access to the resume feature. This is good, but also has the potential for the unintended effect I noted.

I think people are already putting in the effort of making CI more reliable

Some people are, yes. But there is not a large coordinated effort like there was when the Testing WG existed, for example. And I think we need something like that. (You'll notice I'm not volunteering to do it, so I'm part of the problem here, for sure.)

@RaisinTen
Copy link
Contributor

IIUC, the label means that Triagers can do it too, so it expands the pool of people that have access to the resume feature. This is good, but also has the potential for the unintended effect I noted.

That shouldn't be necessary though. We can make use of the GH APIs to check if the person who added the label is a collaborator and then run the actions.

Some people are, yes. But there is not a large coordinated effort like there was when the Testing WG existed, for example. And I think we need something like that. (You'll notice I'm not volunteering to do it, so I'm part of the problem here, for sure.)

So we don't have a clear path to make CI more resilient? Just wanted to make sure if you were -1 about a PR landing that adds this feature at least for collaborators only.

@Trott
Copy link
Member

Trott commented Mar 5, 2022

Just wanted to make sure if you were -1 about a PR landing that adds this feature at least for collaborators only.

I'm not -1 on that or extending it to triagers.

@benjamingr
Copy link
Member Author

Hey, I'd like the TSC to discuss two things here (personal opinions below):

  • Should triagers be able to land things and run CI from the GitHub labels?
  • Should Node.js add a resume-ci label?

Regarding CI and landing things: Personally I think running CI posts some but not significant risks - and it means the TSC needs to decide how strict it wants to be about onboarding triagers . Landing things with the label is fine given NCU enforces reviews, green CI and a correct commit format.

Regarding resume-ci: Given the CI is flaky and has been for 5+ years at different points I do see a lot of value in such a label or streamlining how CI runs with flaky tests better (that is: not just mark them flaky but logic like: "If this test fails everywhere and in only one architecture and is not in a file changed in this PR - automatically don't fail the CI on it")

@benjamingr benjamingr added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Mar 5, 2022
@Ayase-252
Copy link
Member

I think the TSC had decided that triagers are eligible to use commit queue to land PR.

Refs: nodejs/TSC#1039 (comment)

@benjamingr
Copy link
Member Author

Great I didn't know that! Then just ci and the resume-ci label

@mhdawson
Copy link
Member

is not in a file changed in this PR

@Trott I'm wondering how we would do that in general. Mapping tests to which files they might be affected by is not obvious to me.

The resume ci label seems useful to me as well, I share the concern over making sure we look at failures before resuming, but that might be handled by documenting or surfacing existing documentation that sets the expectations. I'd personally be fine with an expectation that you look at failures make sure there are open issues for them before resuming.

@mhdawson
Copy link
Member

Discussed in previous TSC meeting, see https://github.com/nodejs/TSC/blob/main/meetings/2022-03-10.md#nodejsnode and in #42125

Removing from TSC agenda as we think questions have been answered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Issues and PRs related to build files or the CI.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
@Trott @benjamingr @richardlau @Ayase-252 @MoLow @mhdawson @RaisinTen and others