Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-67099] Make trim labels more selective when we're operating on selected nodes #5882

Merged
merged 7 commits into from
Nov 13, 2021

Conversation

Vlatombe
Copy link
Member

@Vlatombe Vlatombe commented Nov 4, 2021

This follows up #5402 and #5412. We've been noticing some provisioning problems with the new trimLabels implementation because it calls Cloud#canProvision way more than before.

This is reworking the implementation to only affect labels relevant to affected nodes on local operations (such as addNode, removeNode), which are being called often when using ephemeral nodes from clouds that come and go in rapid succession.

See JENKINS-67099.

Proposed changelog entries

  • Only apply trimLabels operation to affected nodes when adding or removing them.

Proposed upgrade guidelines

N/A

Submitter checklist

  • (If applicable) Jira issue is well described
  • Changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developer, depending on the change). Examples
    • Fill-in the Proposed changelog entries section only if there are breaking changes or other changes which may require extra steps from users during the upgrade
  • Appropriate autotests or explanation to why this change has no tests
  • For dependency updates: links to external changelogs and, if possible, full diffs

Desired reviewers

@jenkinsci/core

Maintainer checklist

Before the changes are marked as ready-for-merge:

  • There are at least 2 approvals for the pull request and no outstanding requests for change
  • Conversations in the pull request are over OR it is explicit that a reviewer does not block the change
  • Changelog entries in the PR title and/or Proposed changelog entries are correct
  • Proper changelog labels are set so that the changelog can be generated automatically
  • If the change needs additional upgrade steps from users, upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the PR title. (example)
  • If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).

@jglick
Copy link
Member

jglick commented Nov 4, 2021

see #5402

@Vlatombe Vlatombe changed the title Make trim labels more selective when we're operating on selected nodes [JENKINS-67099] Make trim labels more selective when we're operating on selected nodes Nov 10, 2021
@Vlatombe Vlatombe marked this pull request as ready for review November 10, 2021 09:39
@Vlatombe Vlatombe requested a review from jtnord November 10, 2021 09:43
Copy link
Member

@jtnord jtnord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple of code comment suggestions, but otherwise looks good to me.

core/src/main/java/jenkins/model/Jenkins.java Show resolved Hide resolved
core/src/main/java/jenkins/model/Jenkins.java Outdated Show resolved Hide resolved
@@ -2239,14 +2239,34 @@ public void setNodes(final List<? extends Node> n) throws IOException {
* but we also call this periodically to self-heal any data out-of-sync issue.
*/
/*package*/ void trimLabels() {
trimLabels((Set) null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be cleaner as trimLabels(eveyLabelInJenkinsAgentOrCloud) but I understand the performance shortcut is desired here.

@jtnord jtnord requested review from a team and res0nance November 10, 2021 14:17
Copy link
Contributor

@res0nance res0nance left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jtnord jtnord added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Nov 11, 2021
@timja timja added the rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted label Nov 13, 2021
@timja timja merged commit 4d5a979 into jenkinsci:master Nov 13, 2021
@jglick
Copy link
Member

jglick commented Nov 15, 2021

Were you able to verify that this implementation improves performance?

@Vlatombe
Copy link
Member Author

Not in real conditions.

@Vlatombe Vlatombe deleted the trim-labels-selected-nodes branch November 16, 2021 09:04
cathychan pushed a commit to cathychan/jenkins that referenced this pull request Dec 17, 2021
…on selected nodes (jenkinsci#5882)

Co-authored-by: James Nord <[email protected]>
(cherry picked from commit 4d5a979)
@basil
Copy link
Member

basil commented Apr 7, 2022

Possible regression: JENKINS-68155

jtnord added a commit to jtnord/workflow-durable-task-step-plugin that referenced this pull request May 11, 2022
calling setLabels on an agent will not persist the node.

in older versions of Jenkins the tests would be less flaky as adding any
Node would cause all labels to be re-evaluated, so when creating a few
agents and adding labels in a loop the last one created would at least
deterministically ensure that all previous agents labels where correct.

However since 2.332 (jenkinsci/jenkins#5882)
only labels part of a node added or removed would be updated, and when
creating the agents they where created without labels, which where added
later.

This caused tests to be flaky depending on when the periodic
`trimLabels` was called (or at least on other timing related things)

THis was discovered by enabling a loggerrule for hudson.model.queue and
observing that the builds would timeout as not all the agents would have
the expected nodes.

e.g.

```
  12.023 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[ jenkinsci#1] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo?
  12.023 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave0 #0] is a potential candidate for task part of demo jenkinsci#1
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave2 #0] rejected part of demo jenkinsci#1: ?slave2? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave1 #0] rejected part of demo jenkinsci#1: ?slave1? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[ #0] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave3 #0] rejected part of demo jenkinsci#1: ?slave3? doesn?t have label ?foo?
```

from `reuseNodesWithSameLabelsInParallelStages`

Additionally creating agents and waiting for them to come oneline is
slow.  A pipeline will start and then wait for the node to be available,
so we can do other things whilst the agent is connecting.

For the case where we need a number of agents connected before we start
to run the pipeline, we now create iall the agents before waiting for them to connect.
jtnord added a commit to jtnord/workflow-durable-task-step-plugin that referenced this pull request May 11, 2022
calling setLabels on an agent will not persist the node.

in older versions of Jenkins the tests would be less flaky as adding any
Node would cause all labels to be re-evaluated, so when creating a few
agents and adding labels in a loop the last one created would at least
deterministically ensure that all previous agents labels where correct.

However since 2.332 (jenkinsci/jenkins#5882)
only labels part of a node added or removed would be updated, and when
creating the agents they where created without labels, which where added
later.

This caused tests to be flaky depending on when the periodic
`trimLabels` was called (or at least on other timing related things)

THis was discovered by enabling a loggerrule for hudson.model.queue and
observing that the builds would timeout as not all the agents would have
the expected nodes.

e.g.

```
  12.023 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[ jenkinsci#1] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo?
  12.023 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave0 #0] is a potential candidate for task part of demo jenkinsci#1
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave2 #0] rejected part of demo jenkinsci#1: ?slave2? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave1 #0] rejected part of demo jenkinsci#1: ?slave1? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[ #0] rejected part of demo jenkinsci#1: ?Jenkins? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave3 #0] rejected part of demo jenkinsci#1: ?slave3? doesn?t have label ?foo?
```

from `reuseNodesWithSameLabelsInParallelStages`

Additionally creating agents and waiting for them to come oneline is
slow.  A pipeline will start and then wait for the node to be available,
so we can do other things whilst the agent is connecting.

For the case where we need a number of agents connected before we start
to run the pipeline, we now create iall the agents before waiting for them to connect.
jglick added a commit to jenkinsci/workflow-durable-task-step-plugin that referenced this pull request May 11, 2022
* deflake and speedup ExecutorStepTest tests

calling setLabels on an agent will not persist the node.

in older versions of Jenkins the tests would be less flaky as adding any
Node would cause all labels to be re-evaluated, so when creating a few
agents and adding labels in a loop the last one created would at least
deterministically ensure that all previous agents labels where correct.

However since 2.332 (jenkinsci/jenkins#5882)
only labels part of a node added or removed would be updated, and when
creating the agents they where created without labels, which where added
later.

This caused tests to be flaky depending on when the periodic
`trimLabels` was called (or at least on other timing related things)

THis was discovered by enabling a loggerrule for hudson.model.queue and
observing that the builds would timeout as not all the agents would have
the expected nodes.

e.g.

```
  12.023 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[ #1] rejected part of demo #1: ?Jenkins? doesn?t have label ?foo?
  12.023 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave0 #0] is a potential candidate for task part of demo #1
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave2 #0] rejected part of demo #1: ?slave2? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave1 #0] rejected part of demo #1: ?slave1? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[ #0] rejected part of demo #1: ?Jenkins? doesn?t have label ?foo?
  12.024 [id=141]       FINEST  hudson.model.Queue#maintain: JobOffer[slave3 #0] rejected part of demo #1: ?slave3? doesn?t have label ?foo?
```

from `reuseNodesWithSameLabelsInParallelStages`

Additionally creating agents and waiting for them to come oneline is
slow.  A pipeline will start and then wait for the node to be available,
so we can do other things whilst the agent is connecting.

For the case where we need a number of agents connected before we start
to run the pipeline, we now create iall the agents before waiting for them to connect.

* Update src/test/java/org/jenkinsci/plugins/workflow/support/steps/ExecutorStepTest.java

Co-authored-by: Jesse Glick <[email protected]>

Co-authored-by: Jesse Glick <[email protected]>
jtnord added a commit to jtnord/maven-plugin that referenced this pull request May 11, 2022
calling setLabels on an agent will not persist the node.

in older versions of Jenkins the tests would be less flaky as adding any
Node would cause all labels to be re-evaluated, so when creating a few
agents and adding labels in a loop the last one created would at least
deterministically ensure that all previous agents labels where correct.

However since 2.332 (jenkinsci/jenkins#5882)
only labels part of a node added or removed would be updated, and when
creating the agents they where created without labels, which where added
later.

This caused tests to be flaky depending on when the periodic
trimLabels was called (or at least on other timing related things)

Additionally creating agents and waiting for them to come oneline is
slow. A job can be scheduled and will then wait for the node to be available,
so we can do other things whilst the agent is connecting.
jtnord added a commit to jtnord/pipeline-model-definition-plugin that referenced this pull request May 11, 2022
in older versions of Jenkins the tests would be less flaky as adding any
Node would cause all labels to be re-evaluated, so when creating a few
agents and adding labels in a loop the last one created would at least
deterministically ensure that all previous agents labels where correct.

However since 2.332 (jenkinsci/jenkins#5882)
only labels part of a node added or removed would be updated, and when
creating the agents they where created without labels, which where added
later.

This has the potential to cause tests to be flaky depending on when the periodic
trimLabels was called (or at least on other timing related things)

Additionally creating agents and waiting for them to come oneline is
slow. A job can be scheduled and the pipeline will run and then wait for the node to be
available,
so we can do other things whilst the agent is connecting.
jtnord added a commit to jtnord/github-branch-source-plugin that referenced this pull request May 11, 2022
calling setLabels on an agent will not persist the node.

in older versions of Jenkins the tests would be less flaky as adding any
Node would cause all labels to be re-evaluated, so when creating a few
agents and adding labels in a loop the last one created would at least
deterministically ensure that all previous agents labels where correct.

However since 2.332 (jenkinsci/jenkins#5882)
only labels part of a node added or removed would be updated, and when
creating the agents they where created without labels, which where added
later.

This has the potential to cause tests to be flaky depending on when the periodic
trimLabels was called (or at least on other timing related things)

Additionally creating agents and waiting for them to come oneline is
slow. A job can be scheduled and the pipeline will run and then wait for the node to be
available,
so we can do other things whilst the agent is connecting.
olamy pushed a commit to jenkinsci/maven-plugin that referenced this pull request May 12, 2022
calling setLabels on an agent will not persist the node.

in older versions of Jenkins the tests would be less flaky as adding any
Node would cause all labels to be re-evaluated, so when creating a few
agents and adding labels in a loop the last one created would at least
deterministically ensure that all previous agents labels where correct.

However since 2.332 (jenkinsci/jenkins#5882)
only labels part of a node added or removed would be updated, and when
creating the agents they where created without labels, which where added
later.

This caused tests to be flaky depending on when the periodic
trimLabels was called (or at least on other timing related things)

Additionally creating agents and waiting for them to come oneline is
slow. A job can be scheduled and will then wait for the node to be available,
so we can do other things whilst the agent is connecting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants