Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pruneOrphan): convert nomad expirtyTime from ns to ms #170

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

arsiesys
Copy link

@arsiesys arsiesys commented Dec 19, 2022

We are using Nomad 1.4.3 and jenkins 2.361.4.

I noticed that the pruneOrphan feature wasn't working on our side.
After digging, I noticed two anomalies:

  • The submitTime returned in the nomad Job Definition by the nomad API is using nanoseconds and the plugins seems to assume it's Milliseconds. Which means the parsed date was incorrect and the prune never happening.
  • The pluggin add the timeout value to the parsed expirtyTime but do not attribute the output to a variable, as a consequence, it wasn't used.

I address the previous anomalies by converting the parsed values to milliseconds and update the expirtyTime variable with the added timeout.

I did a small fix that seems to work on our side based on my first tests:

Dec 19, 2022 9:17:45 PM FINE org.jenkinsci.plugins.nomad.NomadCloud
Found Orphaned Node: jenkins-xx-xxx-4c83d5ce4910a6 in namespace default in region global

I am far away of being an expert with Java. If this PR need some "touch up", please help! :)

Thanks !

@@ -177,8 +177,8 @@ private void pruneOrphanedWorkers(NomadWorkerTemplate template) {
String jobNamespace = worker.getJobSummary().getNamespace();
JSONObject job = this.nomad.getRunningWorker(jobSummary.getJobID(), jobNamespace);
String jobRegion = job.getString("Region");
Instant expiryTime = Instant.ofEpochMilli(job.getLong("SubmitTime"));
expiryTime.plusSeconds(this.workerTimeout * 60);
Instant expiryTime = Instant.ofEpochMilli(Math.round(job.getLong("SubmitTime")/1000000));
Copy link
Member

@j3t j3t Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

You have to fix the "SpotBugs-finding" first in order to get the build green. Suggestion, just remove Math.round like below:

Instant expiryTime = Instant.ofEpochMilli(job.getLong("SubmitTime") / 1000000);
expiryTime = expiryTime.plusSeconds(this.workerTimeout * 60L);

Copy link
Author

@arsiesys arsiesys Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used math.Float because ofEpochMilli is expececting a long and the divide to convert to ns generate a number with decimals result of a BigDecimal type. The Math.Round solved both issues (remove the decimals + return a long).
I pushed a different way to do it (and tested it), hope it's a better way ! :D

groovy.lang.MissingMethodException: No signature of method: static java.time.Instant.toEpochMilli() is applicable for argument types: (java.math.BigDecimal) values: [1671487054166.230409]
Possible solutions: toEpochMilli(), ofEpochMilli(long)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted the expiry time calculation so that it is more clear. Could you please check whether this is working for you?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's working :D !

@arsiesys
Copy link
Author

Also, I think this fix should come with a warning.. As this feature wasn't working before.
In the case of multiple separated jenkins instance using the same nomad configuration and mostly the same "job prefix" name, instance A could remove the job of the instance B if they use the same prefix. Each jenkins instance using the same nomad cluster should use differents prefix in their nomad templates.

@j3t j3t force-pushed the fix/prune_orphan_expiryTime branch from 7ef8ac8 to 79a6c7d Compare December 21, 2022 09:51
@j3t
Copy link
Member

j3t commented Dec 21, 2022

Also, I think this fix should come with a warning.. As this feature wasn't working before.

Makes sense, we could add something like this to the release notes I guess.

In the case of multiple separated jenkins instance using the same nomad configuration and mostly the same "job prefix" name, instance A could remove the job of the instance B if they use the same prefix. Each jenkins instance using the same nomad cluster should use differents prefix in their nomad templates.

tbh, I'm not familiar with the prune feature but I guess you are right, that could be the case. We should add a warning which is displayed right to the prune option so that the user is at least aware of the risk. Maybe we could also migrate the cloud configurations so that the prune feature is deactivated for all existing configurations and must be activated actively.

@aarnaud
Copy link

aarnaud commented Mar 29, 2023

Hi,
When this PR can be merge, it's a bugfix we need because nomad jobs aren't clear correctly.
If the there is breaking changes / behaviour changes we can change the version to increment the major version ?

@aarnaud
Copy link

aarnaud commented Mar 29, 2023

I tested and running this patch in our environment its work well

@arsiesys arsiesys force-pushed the fix/prune_orphan_expiryTime branch from 79a6c7d to cc0811d Compare March 29, 2023 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants