[JENKINS-67099] Make trim labels more selective when we're operating on selected nodes #5882

Vlatombe · 2021-11-04T16:56:22Z

This follows up #5402 and #5412. We've been noticing some provisioning problems with the new trimLabels implementation because it calls Cloud#canProvision way more than before.

This is reworking the implementation to only affect labels relevant to affected nodes on local operations (such as addNode, removeNode), which are being called often when using ephemeral nodes from clouds that come and go in rapid succession.

See JENKINS-67099.

Proposed changelog entries

Only apply trimLabels operation to affected nodes when adding or removing them.

Proposed upgrade guidelines

N/A

Submitter checklist

(If applicable) Jira issue is well described
Changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developer, depending on the change). Examples
- Fill-in the Proposed changelog entries section only if there are breaking changes or other changes which may require extra steps from users during the upgrade
Appropriate autotests or explanation to why this change has no tests
For dependency updates: links to external changelogs and, if possible, full diffs

Desired reviewers

@jenkinsci/core

Maintainer checklist

Before the changes are marked as ready-for-merge:

There are at least 2 approvals for the pull request and no outstanding requests for change
Conversations in the pull request are over OR it is explicit that a reviewer does not block the change
Changelog entries in the PR title and/or Proposed changelog entries are correct
Proper changelog labels are set so that the changelog can be generated automatically
If the change needs additional upgrade steps from users, upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the PR title. (example)
If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).

core/src/main/java/jenkins/model/Jenkins.java

jglick · 2021-11-04T17:22:35Z

see #5402

core/src/main/java/jenkins/model/Nodes.java

jtnord

left a couple of code comment suggestions, but otherwise looks good to me.

core/src/main/java/jenkins/model/Jenkins.java

jtnord · 2021-11-10T11:05:26Z

core/src/main/java/jenkins/model/Jenkins.java

@@ -2239,14 +2239,34 @@ public void setNodes(final List<? extends Node> n) throws IOException {
     * but we also call this periodically to self-heal any data out-of-sync issue.
     */
    /*package*/ void trimLabels() {
+        trimLabels((Set) null);


would be cleaner as trimLabels(eveyLabelInJenkinsAgentOrCloud) but I understand the performance shortcut is desired here.

Co-authored-by: James Nord <[email protected]>

res0nance

LGTM

jglick · 2021-11-15T17:35:08Z

Were you able to verify that this implementation improves performance?

Vlatombe · 2021-11-16T09:04:32Z

Not in real conditions.

…on selected nodes (jenkinsci#5882) Co-authored-by: James Nord <[email protected]> (cherry picked from commit 4d5a979)

basil · 2022-04-07T20:31:55Z

Possible regression: JENKINS-68155

calling setLabels on an agent will not persist the node. in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This caused tests to be flaky depending on when the periodic `trimLabels` was called (or at least on other timing related things) THis was discovered by enabling a loggerrule for hudson.model.queue and observing that the builds would timeout as not all the agents would have the expected nodes. e.g. ``` 12.023 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[ jenkinsci#1] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo? 12.023 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave0 #0] is a potential candidate for task part of demo jenkinsci#1 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave2 #0] rejected part of demo jenkinsci#1: ?slave2? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave1 #0] rejected part of demo jenkinsci#1: ?slave1? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[ #0] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave3 #0] rejected part of demo jenkinsci#1: ?slave3? doesn?t have label ?foo? ``` from `reuseNodesWithSameLabelsInParallelStages` Additionally creating agents and waiting for them to come oneline is slow. A pipeline will start and then wait for the node to be available, so we can do other things whilst the agent is connecting. For the case where we need a number of agents connected before we start to run the pipeline, we now create iall the agents before waiting for them to connect.

* deflake and speedup ExecutorStepTest tests calling setLabels on an agent will not persist the node. in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This caused tests to be flaky depending on when the periodic `trimLabels` was called (or at least on other timing related things) THis was discovered by enabling a loggerrule for hudson.model.queue and observing that the builds would timeout as not all the agents would have the expected nodes. e.g. ``` 12.023 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[ #1] rejected part of demo #1: ?Jenkins? doesn?t have label ?foo? 12.023 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave0 #0] is a potential candidate for task part of demo #1 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave2 #0] rejected part of demo #1: ?slave2? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave1 #0] rejected part of demo #1: ?slave1? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[ #0] rejected part of demo #1: ?Jenkins? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave3 #0] rejected part of demo #1: ?slave3? doesn?t have label ?foo? ``` from `reuseNodesWithSameLabelsInParallelStages` Additionally creating agents and waiting for them to come oneline is slow. A pipeline will start and then wait for the node to be available, so we can do other things whilst the agent is connecting. For the case where we need a number of agents connected before we start to run the pipeline, we now create iall the agents before waiting for them to connect. * Update src/test/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepTest.java Co-authored-by: Jesse Glick <[email protected]> Co-authored-by: Jesse Glick <[email protected]>

calling setLabels on an agent will not persist the node. in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This caused tests to be flaky depending on when the periodic trimLabels was called (or at least on other timing related things) Additionally creating agents and waiting for them to come oneline is slow. A job can be scheduled and will then wait for the node to be available, so we can do other things whilst the agent is connecting.

in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This has the potential to cause tests to be flaky depending on when the periodic trimLabels was called (or at least on other timing related things) Additionally creating agents and waiting for them to come oneline is slow. A job can be scheduled and the pipeline will run and then wait for the node to be available, so we can do other things whilst the agent is connecting.

calling setLabels on an agent will not persist the node. in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This has the potential to cause tests to be flaky depending on when the periodic trimLabels was called (or at least on other timing related things) Additionally creating agents and waiting for them to come oneline is slow. A job can be scheduled and the pipeline will run and then wait for the node to be available, so we can do other things whilst the agent is connecting.

calling setLabels on an agent will not persist the node. in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This caused tests to be flaky depending on when the periodic trimLabels was called (or at least on other timing related things) Additionally creating agents and waiting for them to come oneline is slow. A job can be scheduled and will then wait for the node to be available, so we can do other things whilst the agent is connecting.

Make trim labels more selective when we're operating on selected nodes

e1f3ca8

jtnord reviewed Nov 4, 2021

View reviewed changes

core/src/main/java/jenkins/model/Jenkins.java Outdated Show resolved Hide resolved

jtnord reviewed Nov 4, 2021

View reviewed changes

core/src/main/java/jenkins/model/Nodes.java Outdated Show resolved Hide resolved

Vlatombe added 5 commits November 5, 2021 15:02

Remove unused variable

82abd6d

n can be null

82713a7

Don't filter in stream in order to get a writable iterator

dc8f944

Refactoring

ff971a5

Apply @teilo's suggestion

857e17a

Vlatombe changed the title ~~Make trim labels more selective when we're operating on selected nodes~~ [JENKINS-67099] Make trim labels more selective when we're operating on selected nodes Nov 10, 2021

Vlatombe marked this pull request as ready for review November 10, 2021 09:39

Vlatombe requested a review from jtnord November 10, 2021 09:43

jtnord approved these changes Nov 10, 2021

View reviewed changes

Apply suggestions from @jtnord code review

1d17bb5

Co-authored-by: James Nord <[email protected]>

jtnord approved these changes Nov 10, 2021

View reviewed changes

jtnord requested review from a team and res0nance November 10, 2021 14:17

res0nance approved these changes Nov 10, 2021

View reviewed changes

timja approved these changes Nov 10, 2021

View reviewed changes

jtnord added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Nov 11, 2021

timja added the rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted label Nov 13, 2021

timja merged commit 4d5a979 into jenkinsci:master Nov 13, 2021

Vlatombe deleted the trim-labels-selected-nodes branch November 16, 2021 09:04

timja mentioned this pull request Jan 20, 2022

JENKINS-67635 consider agent label expressions when applying trimLabels #6193

Merged

10 tasks

jtnord mentioned this pull request May 11, 2022

Deflake and speed up parts of ExecutorStepTest jenkinsci/workflow-durable-task-step-plugin#224

Merged

6 tasks

jtnord mentioned this pull request May 11, 2022

create agents with labels pre-set jenkinsci/maven-plugin#267

Merged

6 tasks

jtnord mentioned this pull request May 11, 2022

calling setLabels on an agent will not persist the node. jenkinsci/pipeline-model-definition-plugin#527

Merged

jtnord mentioned this pull request May 11, 2022

address potential flake in GithubAppCredentialsTest jenkinsci/github-branch-source-plugin#562

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-67099] Make trim labels more selective when we're operating on selected nodes #5882

[JENKINS-67099] Make trim labels more selective when we're operating on selected nodes #5882

Vlatombe commented Nov 4, 2021 •

edited

Loading

jglick commented Nov 4, 2021

jtnord left a comment

jtnord Nov 10, 2021

res0nance left a comment

jglick commented Nov 15, 2021

Vlatombe commented Nov 16, 2021

basil commented Apr 7, 2022

[JENKINS-67099] Make trim labels more selective when we're operating on selected nodes #5882

[JENKINS-67099] Make trim labels more selective when we're operating on selected nodes #5882

Conversation

Vlatombe commented Nov 4, 2021 • edited Loading

Proposed changelog entries

Proposed upgrade guidelines

Submitter checklist

Desired reviewers

Maintainer checklist

jglick commented Nov 4, 2021

jtnord left a comment

Choose a reason for hiding this comment

jtnord Nov 10, 2021

Choose a reason for hiding this comment

res0nance left a comment

Choose a reason for hiding this comment

jglick commented Nov 15, 2021

Vlatombe commented Nov 16, 2021

basil commented Apr 7, 2022

Vlatombe commented Nov 4, 2021 •

edited

Loading