-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terminate the instance when 404 occured. #489
Conversation
… Else jenkins nodes cannot be deleted if there is no corresponding instance in the cloud.
… of zombie offline nodes in jenkins that failed to start in the cloud.
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineComputerLauncher.java
Show resolved
Hide resolved
// return immediately, hoping for the best. | ||
cloud.getClient().terminateInstanceAsync(cloud.getProjectId(), zone, name); | ||
if (instanceExistsInCloud) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Is it just to avoid an unnecessary error message in the logs in case the instance does not exist in the cloud?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is based on another branch (see #470) but I don't think this condition is useful. Theoretically we could even verify the instance exists in the cloud but when running the terminateInstanceAsync
the instance could have been removed. I will remove that part.
Please @Theoderich if we're missing cases, please tell us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done into f6fc88c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the fix for #471 and should not be removed.
The Problem:
When Jenkins tries to start a node and the quotas in the target cloud do not allow it, the instance is not started in the cloud but Jenkins thinks that a node was started. Now there is a Zombie node in Jenkins.
When trying to delete this Zombie node, Jenkins calls cloud.getClient().terminateInstanceAsync
which throws an Exception since the Instance does not exist in the cloud. The Delete is aborted and the zombie node stays, forever undeletable.
The Solution:
I added a check if the node actually exists in the cloud before calling cloud.getClient().terminateInstanceAsync
. If the node does not exist as an instance in the cloud, there is no need to terminate it and the deletion can proceed.
Why not just catch the exception thrown by cloud.getClient().terminateInstanceAsync
?
It is not clear from the exception why the terminate operation failed. If the instance exists in the cloud and the termination failed for some other reason, we do not want to delete the node in Jenkins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the detailed explanation.
I've reverted the changes.
src/main/java/com/google/jenkins/plugins/computeengine/ComputeEngineComputerLauncher.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there some effective unit tests for this?
I don't think we can write any unit test for this (apart from maybe using Wiremock but this seems huge...). |
… it still exists." This reverts commit f6fc88c.
@Artmorse can you please fix the build failure? Seems like SpotBugs issues? I just labeled the issue as |
@Artmorse merged. CD is enabled, so a release should be cut automatically within the next minutes. |
This PR is based on #470.
This wants to fix the #469 issue.
Manual testing
I've configured a GCE cloud with the specific label
gcp-ce
.I've created a dummy
hello-world
pipeline using GCE nodes (using the labelgcp-ce
).I've provisionned a node from the cloud.
Once the compute engine up and running, I ran the
hello-world
pipeline and it working as expected.Then I've deleted it using the GCP UI the node is well terminated and removed from Jenkins.
I re-ran the pipeline and a new node has been provisionned.
Submitter checklist