Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-35246] Kubernetes agents not getting deleted in Jenkins after pods are deleted #217

Merged
merged 11 commits into from
Sep 11, 2017

Conversation

Vlatombe
Copy link
Member

@Vlatombe Vlatombe commented Sep 5, 2017

This fixes the provisioning lifecycle for KubernetesSlave. The problem
was ProvisioningCallback was adding the node to Jenkins but it is
expected to be done by NodeProvisioner once the future completes.

This could lead in cases of short node usage in cases where the node would
get readded to Jenkins after the termination had occurred.

Moving the provisioning logic to a ComputerLauncher honors the lifecycle
properly since ComputerLauncher#launch is called after the
NodeProvisioner has finished its job, eliminating risk of race
condition.

https://issues.jenkins-ci.org/browse/JENKINS-35246

@carlossg carlossg changed the title Move ProvisioningCallback logic to KubernetesLauncher [JENKINS-35246] Kubernetes agents not getting deleted in Jenkins after pods are deleted Sep 5, 2017
@Vlatombe Vlatombe force-pushed the provisioning_lifecycle_fix branch 2 times, most recently from d50c07a to 6f56cf9 Compare September 5, 2017 14:24
@@ -212,6 +212,9 @@
at io.fabric8.kubernetes.client.utils.Serialization.<clinit>(Serialization.java:37)
-->
<pluginFirstClassLoader>true</pluginFirstClassLoader>
<systemProperties>
<hudson.slaves.NodeProvisioner.initialDelay>0</hudson.slaves.NodeProvisioner.initialDelay>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

public Cloud getCloud() {
return Jenkins.getInstance().getCloud(getCloudName());
@Nonnull
public KubernetesCloud getCloud() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that I don't understand enough about this, but could this be a binary compatibility issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I restored the original method.

Copy link
Contributor

@marvinthepa marvinthepa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things..


public PodTemplate getTemplate() {
return template;
}

public KubernetesSlave(PodTemplate template, String nodeDescription, KubernetesCloud cloud, String labelStr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am wrong, but isn't this unused now and could/should be marked @Deprecated?

computer.disconnect(OfflineCause.create(new Localizable(HOLDER, "offline")));
return;
}
KubernetesCloud cloud = getKubernetesCloud();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if the cloud no longer exists, e.g. because it was deleted by a user?

@@ -0,0 +1,465 @@
package org.csanchez.jenkins.plugins.kubernetes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is missing a license header, AFAIK @carlossg (or CloudBees?) requires them.

@@ -24,68 +24,18 @@

package org.csanchez.jenkins.plugins.kubernetes;

import static org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have a few unused imports here.

@Vlatombe
Copy link
Member Author

Thanks, @marvinthepa, all fixed now.

This fixes the provisioning lifecycle for KubernetesSlave. The problem
was ProvisioningCallback was adding the node to Jenkins but it is
expected to be done by NodeProvisioner once the future completes.

This could lead in cases of short node usage in cases where the node would
get readded to Jenkins after the termination had occurred.

Moving the provisioning logic to a ComputerLauncher honors the lifecycle
properly since ComputerLauncher#launch is called after the
NodeProvisioner has finished its job, eliminating risk of race
condition.
@carlossg carlossg merged commit aea6ecd into jenkinsci:master Sep 11, 2017
@marvinthepa
Copy link
Contributor

@Vlatombe is it possible that the changes from this pull request lead to the fact that a slave can now connect to the master before all containers in that pod have successfully started?

I have noticed this behavior when testing another PR, and this one looks a bit suspicious in this regard.. I have to admit that I didn't check closely, though.

@Vlatombe
Copy link
Member Author

@marvinthepa It's possible since the launch is now asynchronous. Setting computer.setAcceptingTasks(false); at the beginning of the launcher and computer.setAcceptingTasks(true); at the end should do the trick, since the logic already checks that all containers within the pod are launched.

marvinthepa pushed a commit to marvinthepa/kubernetes-plugin that referenced this pull request Sep 21, 2017
Prevent to start builds that might try to use containers
that haven't started (or will never start due to errors).

This reverts a change in behavior introduced in jenkinsci#217.
@dpyoung2
Copy link

Hello,
Will this fix be included in the next release of the plugin? If yes, when is the next release?

carlossg pushed a commit that referenced this pull request Oct 16, 2017
Prevent to start builds that might try to use containers
that haven't started (or will never start due to errors).

This reverts a change in behavior introduced in #217.
@Vlatombe Vlatombe deleted the provisioning_lifecycle_fix branch December 23, 2020 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants