-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-35246] Kubernetes agents not getting deleted in Jenkins after pods are deleted #217
[JENKINS-35246] Kubernetes agents not getting deleted in Jenkins after pods are deleted #217
Conversation
893de2a
to
b7682b3
Compare
d50c07a
to
6f56cf9
Compare
@@ -212,6 +212,9 @@ | |||
at io.fabric8.kubernetes.client.utils.Serialization.<clinit>(Serialization.java:37) | |||
--> | |||
<pluginFirstClassLoader>true</pluginFirstClassLoader> | |||
<systemProperties> | |||
<hudson.slaves.NodeProvisioner.initialDelay>0</hudson.slaves.NodeProvisioner.initialDelay> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
public Cloud getCloud() { | ||
return Jenkins.getInstance().getCloud(getCloudName()); | ||
@Nonnull | ||
public KubernetesCloud getCloud() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit that I don't understand enough about this, but could this be a binary compatibility issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I restored the original method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor things..
|
||
public PodTemplate getTemplate() { | ||
return template; | ||
} | ||
|
||
public KubernetesSlave(PodTemplate template, String nodeDescription, KubernetesCloud cloud, String labelStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am wrong, but isn't this unused now and could/should be marked @Deprecated
?
computer.disconnect(OfflineCause.create(new Localizable(HOLDER, "offline"))); | ||
return; | ||
} | ||
KubernetesCloud cloud = getKubernetesCloud(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here if the cloud no longer exists, e.g. because it was deleted by a user?
@@ -0,0 +1,465 @@ | |||
package org.csanchez.jenkins.plugins.kubernetes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File is missing a license header, AFAIK @carlossg (or CloudBees?) requires them.
@@ -24,68 +24,18 @@ | |||
|
|||
package org.csanchez.jenkins.plugins.kubernetes; | |||
|
|||
import static org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still have a few unused imports here.
ce28918
to
1c0f29a
Compare
Thanks, @marvinthepa, all fixed now. |
This fixes the provisioning lifecycle for KubernetesSlave. The problem was ProvisioningCallback was adding the node to Jenkins but it is expected to be done by NodeProvisioner once the future completes. This could lead in cases of short node usage in cases where the node would get readded to Jenkins after the termination had occurred. Moving the provisioning logic to a ComputerLauncher honors the lifecycle properly since ComputerLauncher#launch is called after the NodeProvisioner has finished its job, eliminating risk of race condition.
…termination call.
1c0f29a
to
05ef5e4
Compare
@Vlatombe is it possible that the changes from this pull request lead to the fact that a slave can now connect to the master before all containers in that pod have successfully started? I have noticed this behavior when testing another PR, and this one looks a bit suspicious in this regard.. I have to admit that I didn't check closely, though. |
@marvinthepa It's possible since the launch is now asynchronous. Setting |
Prevent to start builds that might try to use containers that haven't started (or will never start due to errors). This reverts a change in behavior introduced in jenkinsci#217.
Hello, |
Prevent to start builds that might try to use containers that haven't started (or will never start due to errors). This reverts a change in behavior introduced in #217.
This fixes the provisioning lifecycle for
KubernetesSlave
. The problemwas
ProvisioningCallback
was adding the node to Jenkins but it isexpected to be done by
NodeProvisioner
once the future completes.This could lead in cases of short node usage in cases where the node would
get readded to Jenkins after the termination had occurred.
Moving the provisioning logic to a
ComputerLauncher
honors the lifecycleproperly since
ComputerLauncher#launch
is called after theNodeProvisioner
has finished its job, eliminating risk of racecondition.
https://issues.jenkins-ci.org/browse/JENKINS-35246