Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 322 and 363 #376

Merged
merged 4 commits into from
Jun 28, 2023
Merged

Fix 322 and 363 #376

merged 4 commits into from
Jun 28, 2023

Conversation

pdk27
Copy link
Collaborator

@pdk27 pdk27 commented Jun 27, 2023

Changes:

MaxTotalUses related:

  • Fix maxtotaluses decrement logic (stopped decrementing limited uses all the way to -1, as -1 means unlimited)
  • leave maxTotalUses alone and track remainingUses correctly
    • Make maxTotalUses read-only
    • Allow only decrementing remainingUses by 1
  • add logs in post job action to expose tasks terminated with problems

Tracking all termination reasons:

  • Add a flag to track manual termination of agents
  • add integration test for all termination reasons in RetentionStrategy#check

Fix #363:

  • terminate scheduled instances ONLY IF idle
    See this test for new expected behavior.
EC2FleetCloud info INFO: FleetCloud [label] Set target capacity to '2'
EC2FleetCloud info INFO: FleetCloud [label] Scheduling instance 'i-1' for termination on cloud com.amazon.jenkins.ec2fleet.EC2FleetCloud@22f34197 because of reason: Agent idle for too long
EC2FleetCloud info INFO: FleetCloud [label] Scheduling instance 'i-2' for termination on cloud com.amazon.jenkins.ec2fleet.EC2FleetCloud@22f34197 because of reason: MaxTotalUses exhausted for agent
EC2FleetCloud info INFO: FleetCloud [label] Scheduling instance 'i-3' for termination on cloud com.amazon.jenkins.ec2fleet.EC2FleetCloud@22f34197 because of reason: Agent deleted
EC2FleetCloud info INFO: FleetCloud [label] Skipping termination of the following instances until the next update cycle, as they are still busy doing some work: [i-2].
EC2FleetCloud info INFO: FleetCloud [label] Set target capacity to '0'
EC2FleetCloud info INFO: FleetCloud [label] Removing Jenkins nodes before terminating corresponding EC2 instances
EC2FleetCloud info INFO: FleetCloud [label] Terminated instances: {i-1=Agent idle for too long, i-3=Agent deleted}

Fix #322 (fixes partial problem):

  • Fix lost state (instanceIdsToTerminate) on configuration change
    • Recheck persisted fields for all reasons for termination in RetentionStrategy#check
  • Add and fix tests
  • Add integration test - Configuration change leads to lost state + changes in this PR rebuilds the lost state and terminates instances previously marked for termination
    See this integration test for new expected behavior / fix.
1 tasks submitted for label momo
scheduled task 1, waiting 5 sec
   3.283 [id=610]	INFO	c.a.j.e.EC2RetentionStrategy#taskAccepted: Agent i-2 has 1 builds left
   4.922 [id=610]	INFO	c.a.j.e.EC2RetentionStrategy#postJobAction: Build test0 #1 completed successfully on agent i-2. TimeSpentInQueue: 2s, duration: 1s.

1 tasks submitted for label momo
scheduled task 2, waiting 5 sec
   5.975 [id=608]	INFO	c.a.j.e.EC2RetentionStrategy#taskAccepted: Agent i-1 has 1 builds left
   7.501 [id=608]	INFO	c.a.j.e.EC2RetentionStrategy#postJobAction: Build test1 #1 completed successfully on agent i-1. TimeSpentInQueue: 0s, duration: 1s.

1 tasks submitted for label momo
scheduled task 3, waiting 5 sec
  10.987 [id=625]	INFO	c.a.j.e.EC2RetentionStrategy#taskAccepted: maxTotalUses drained - suspending agent i-2 after current build
  12.072 [id=625]	INFO	c.a.j.e.EC2RetentionStrategy#postJobAction: Build test2 #1 completed successfully on agent i-2. TimeSpentInQueue: 0s, duration: 1s.
  12.072 [id=625]	INFO	c.a.j.e.EC2RetentionStrategy#postJobAction: Calling scheduleToTerminate for node i-2 due to exhausted maxTotalUses.
  12.072 [id=625]	INFO	c.a.j.ec2fleet.EC2FleetCloud#info: testCloud [momo] Scheduling instance 'i-2' for termination on cloud com.amazon.jenkins.ec2fleet.EC2FleetCloud@1820a4f4 because of reason: MaxTotalUses exhausted for agent

HtmlForm[<form method="post" autocomplete="off" name="config" action="configure">]
[HtmlTextInput[<input default="FleetCloud" name="_.name" type="text" class="setting-input   " value="testCloud">]]


  24.517 [id=565]	INFO	c.a.j.e.u.EC2FleetCloudAwareUtils#reassign: Trying to reassign Jenkins computer:testCloud i-1 Builds left: 1 
  24.518 [id=565]	INFO	c.a.j.e.u.EC2FleetCloudAwareUtils#reassign: Trying to reassign Jenkins computer:testCloud i-2 Builds left: 0 

  26.890 [id=557]	INFO	c.a.j.ec2fleet.EC2FleetCloud#info: testCloud [momo] Scheduling instance 'i-2' for termination on cloud com.amazon.jenkins.ec2fleet.EC2FleetCloud@1820a4f4 because of reason: MaxTotalUses exhausted for agent
  26.891 [id=557]	INFO	c.a.j.ec2fleet.EC2FleetCloud#info: testCloud [momo] Set target capacity to '1'
  26.891 [id=557]	INFO	c.a.j.ec2fleet.EC2FleetCloud#info: testCloud [momo] Removing Jenkins nodes before terminating corresponding EC2 instances
  26.892 [id=557]	INFO	c.a.j.ec2fleet.EC2FleetCloud#info: testCloud [momo] Terminated instances: {i-2=MaxTotalUses exhausted for agent}

Result of the integration test before this PR, showing lost state:

1 tasks submitted for label momo
scheduled task 1, waiting 5 sec
  15.332 [id=79]	INFO	c.a.j.e.EC2RetentionStrategy#taskAccepted: Agent i-2 has 1 builds left
  17.095 [id=79]	INFO	c.a.j.e.EC2RetentionStrategy#postJobAction: Agent i-2 is still in use by more than one (1) executors.

1 tasks submitted for label momo
scheduled task 2, waiting 5 sec
  17.994 [id=76]	INFO	c.a.j.e.EC2RetentionStrategy#taskAccepted: Agent i-1 has 1 builds left
  19.526 [id=76]	INFO	c.a.j.e.EC2RetentionStrategy#postJobAction: Agent i-1 is still in use by more than one (1) executors.

1 tasks submitted for label momo
scheduled task 3, waiting 5 sec
  23.001 [id=110]	INFO	c.a.j.e.EC2RetentionStrategy#taskAccepted: maxTotalUses drained - suspending agent after current build i-2
  24.090 [id=110]	INFO	c.a.j.e.EC2RetentionStrategy#postJobAction: Calling scheduleToTerminate for node i-2 due to maxTotalUses (0)
  24.090 [id=110]	INFO	c.a.j.ec2fleet.EC2FleetCloud#info: testCloud [momo] Scheduling instance 'i-2' for termination on cloud com.amazon.jenkins.ec2fleet.EC2FleetCloud@6d5ee02b with force: true

HtmlForm[<form method="post" autocomplete="off" name="config" action="configure">]
[HtmlTextInput[<input default="FleetCloud" name="_.name" type="text" class="setting-input   " value="testCloud">]]
  38.912 [id=22]	INFO	c.a.j.e.u.EC2FleetCloudAwareUtils#reassign: Trying to reassign Jenkins computer:testCloud i-1 Builds left: 1 
  38.912 [id=22]	INFO	c.a.j.e.u.EC2FleetCloudAwareUtils#reassign: Trying to reassign Jenkins computer:testCloud i-2 Builds left: 0 

Old cloud: com.amazon.jenkins.ec2fleet.EC2FleetCloud@580bb94c,new cloud: com.amazon.jenkins.ec2fleet.EC2FleetCloud@62cd79a6
Old cloud#InstanceIdsToTerminate: [i-2],new cloud#InstanceIdsToTerminate: [] 

Testing:

  • tested changes in my jenkins setup with snapshot version of plugin.
  • added unit and integration tests

@pdk27 pdk27 added the do not merge Don't merge this (at least not yet) label Jun 27, 2023
@LikithaVemulapalli LikithaVemulapalli mentioned this pull request Jun 28, 2023
pdk27 added 4 commits June 28, 2023 11:11
add a flag to track termination of agents by plugin
[fix] Fix maxtotaluses decrement logic

add logs in post job action to expose tasks terminated with problems

jenkinsci#322

add and fix tests
…and rebuilding lost state to terminate instances previously marked for termination
@pdk27 pdk27 force-pushed the fix-322-and-363 branch from 5fbda1d to 8059f1c Compare June 28, 2023 16:12
@pdk27 pdk27 marked this pull request as ready for review June 28, 2023 16:29
Copy link
Collaborator

@LikithaVemulapalli LikithaVemulapalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@pdk27 pdk27 merged commit 46d6731 into jenkinsci:master Jun 28, 2023
@pdk27 pdk27 added this to the 2.7.0 milestone Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge Don't merge this (at least not yet)
Projects
None yet
2 participants