-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-67099] Make trim labels more selective when we're operating on selected nodes #5882
Conversation
see #5402 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a couple of code comment suggestions, but otherwise looks good to me.
@@ -2239,14 +2239,34 @@ public void setNodes(final List<? extends Node> n) throws IOException { | |||
* but we also call this periodically to self-heal any data out-of-sync issue. | |||
*/ | |||
/*package*/ void trimLabels() { | |||
trimLabels((Set) null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be cleaner as trimLabels(eveyLabelInJenkinsAgentOrCloud)
but I understand the performance shortcut is desired here.
Co-authored-by: James Nord <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Were you able to verify that this implementation improves performance? |
Not in real conditions. |
…on selected nodes (jenkinsci#5882) Co-authored-by: James Nord <[email protected]> (cherry picked from commit 4d5a979)
Possible regression: JENKINS-68155 |
calling setLabels on an agent will not persist the node. in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This caused tests to be flaky depending on when the periodic `trimLabels` was called (or at least on other timing related things) THis was discovered by enabling a loggerrule for hudson.model.queue and observing that the builds would timeout as not all the agents would have the expected nodes. e.g. ``` 12.023 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[ jenkinsci#1] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo? 12.023 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave0 #0] is a potential candidate for task part of demo jenkinsci#1 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave2 #0] rejected part of demo jenkinsci#1: ?slave2? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave1 #0] rejected part of demo jenkinsci#1: ?slave1? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[ #0] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave3 #0] rejected part of demo jenkinsci#1: ?slave3? doesn?t have label ?foo? ``` from `reuseNodesWithSameLabelsInParallelStages` Additionally creating agents and waiting for them to come oneline is slow. A pipeline will start and then wait for the node to be available, so we can do other things whilst the agent is connecting. For the case where we need a number of agents connected before we start to run the pipeline, we now create iall the agents before waiting for them to connect.
calling setLabels on an agent will not persist the node. in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This caused tests to be flaky depending on when the periodic `trimLabels` was called (or at least on other timing related things) THis was discovered by enabling a loggerrule for hudson.model.queue and observing that the builds would timeout as not all the agents would have the expected nodes. e.g. ``` 12.023 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[ jenkinsci#1] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo? 12.023 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave0 #0] is a potential candidate for task part of demo jenkinsci#1 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave2 #0] rejected part of demo jenkinsci#1: ?slave2? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave1 #0] rejected part of demo jenkinsci#1: ?slave1? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[ #0] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave3 #0] rejected part of demo jenkinsci#1: ?slave3? doesn?t have label ?foo? ``` from `reuseNodesWithSameLabelsInParallelStages` Additionally creating agents and waiting for them to come oneline is slow. A pipeline will start and then wait for the node to be available, so we can do other things whilst the agent is connecting. For the case where we need a number of agents connected before we start to run the pipeline, we now create iall the agents before waiting for them to connect.
* deflake and speedup ExecutorStepTest tests calling setLabels on an agent will not persist the node. in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This caused tests to be flaky depending on when the periodic `trimLabels` was called (or at least on other timing related things) THis was discovered by enabling a loggerrule for hudson.model.queue and observing that the builds would timeout as not all the agents would have the expected nodes. e.g. ``` 12.023 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[ #1] rejected part of demo #1: ?Jenkins? doesn?t have label ?foo? 12.023 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave0 #0] is a potential candidate for task part of demo #1 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave2 #0] rejected part of demo #1: ?slave2? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave1 #0] rejected part of demo #1: ?slave1? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[ #0] rejected part of demo #1: ?Jenkins? doesn?t have label ?foo? 12.024 [id=141] FINEST hudson.model.Queue#maintain: JobOffer[slave3 #0] rejected part of demo #1: ?slave3? doesn?t have label ?foo? ``` from `reuseNodesWithSameLabelsInParallelStages` Additionally creating agents and waiting for them to come oneline is slow. A pipeline will start and then wait for the node to be available, so we can do other things whilst the agent is connecting. For the case where we need a number of agents connected before we start to run the pipeline, we now create iall the agents before waiting for them to connect. * Update src/test/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepTest.java Co-authored-by: Jesse Glick <[email protected]> Co-authored-by: Jesse Glick <[email protected]>
calling setLabels on an agent will not persist the node. in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This caused tests to be flaky depending on when the periodic trimLabels was called (or at least on other timing related things) Additionally creating agents and waiting for them to come oneline is slow. A job can be scheduled and will then wait for the node to be available, so we can do other things whilst the agent is connecting.
in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This has the potential to cause tests to be flaky depending on when the periodic trimLabels was called (or at least on other timing related things) Additionally creating agents and waiting for them to come oneline is slow. A job can be scheduled and the pipeline will run and then wait for the node to be available, so we can do other things whilst the agent is connecting.
calling setLabels on an agent will not persist the node. in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This has the potential to cause tests to be flaky depending on when the periodic trimLabels was called (or at least on other timing related things) Additionally creating agents and waiting for them to come oneline is slow. A job can be scheduled and the pipeline will run and then wait for the node to be available, so we can do other things whilst the agent is connecting.
calling setLabels on an agent will not persist the node. in older versions of Jenkins the tests would be less flaky as adding any Node would cause all labels to be re-evaluated, so when creating a few agents and adding labels in a loop the last one created would at least deterministically ensure that all previous agents labels where correct. However since 2.332 (jenkinsci/jenkins#5882) only labels part of a node added or removed would be updated, and when creating the agents they where created without labels, which where added later. This caused tests to be flaky depending on when the periodic trimLabels was called (or at least on other timing related things) Additionally creating agents and waiting for them to come oneline is slow. A job can be scheduled and will then wait for the node to be available, so we can do other things whilst the agent is connecting.
This follows up #5402 and #5412. We've been noticing some provisioning problems with the new
trimLabels
implementation because it callsCloud#canProvision
way more than before.This is reworking the implementation to only affect labels relevant to affected nodes on local operations (such as
addNode
,removeNode
), which are being called often when using ephemeral nodes from clouds that come and go in rapid succession.See JENKINS-67099.
Proposed changelog entries
trimLabels
operation to affected nodes when adding or removing them.Proposed upgrade guidelines
N/A
Submitter checklist
Proposed changelog entries
section only if there are breaking changes or other changes which may require extra steps from users during the upgradeDesired reviewers
@jenkinsci/core
Maintainer checklist
Before the changes are marked as
ready-for-merge
:Proposed changelog entries
are correctupgrade-guide-needed
label is set and there is aProposed upgrade guidelines
section in the PR title. (example)lts-candidate
to be considered (see query).