Do not ask API server for a `Pod` during deletion unnecessarily #1304

jglick · 2023-01-31T23:16:59Z

Not tested yet, but hoping this would avoid

java.net.SocketTimeoutException: connect timed out
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
        at …
        at java.base/java.net.Socket.connect(Socket.java:609)
        at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
        at …
Caused: java.io.IOException: connect timed out
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:533)
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:570)
        at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleGet(OperationSupport.java:482)
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleGet(BaseOperation.java:742)
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.getMandatory(BaseOperation.java:177)
Caused: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: […]  in namespace: […]  failed.
        at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(Kub
ernetesClientException.java:159)
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.getMandatory(BaseOperation.java:182)
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:144)
        at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.get(BaseOperation.java:93)
        at org.csanchez.jenkins.plugins.kubernetes.KubernetesSlave._terminate(KubernetesSlave.java:313)
        at …

Probably the actual deletion would fail later anyway, but why make two API calls when one suffices (assuming you have not actually configured OnFailure)?

Vlatombe · 2023-02-01T07:40:58Z

Jenkinsfile

-                sh 'mvn -B -ntp -Dset.changelist -Dmaven.test.failure.ignore clean install'
-                infra.prepareToPublishIncrementals()
-                junit 'target/surefire-reports/*.xml'
+        retry(count: 3, conditions: [kubernetesAgent(handleNonKubernetes: true), nonresumable()]) {


[offtopic] handleNonKubernetes: true feels weird, shouldn't the condition be renamed?

Well this is to handle the scenario on ci.jenkins.io where we are requesting a label which currently is handled by the kubernetes plugin but admins have left open the possibility of switching to some other cloud if performance merits it.

Vlatombe · 2023-02-01T07:45:06Z

rebuilding

Vlatombe · 2023-02-01T07:46:54Z

Flake

 16.155 [run In Pod With Restart With Multiple Container Calls #1] run-in-pod-with-restart-with-multiple-container-calls-1-2-przqw has been removed for 15 sec, assuming it is not coming back

would that cause ABORTED build status even though it seems the build is able to complete its steps ?

Related: jenkinsci/workflow-durable-task-step-plugin#180

Prior to that

   1.124 [run In Pod With Restart With Multiple Container Calls #1] Waiting for reconnection of run-in-pod-with-restart-with-multiple-container-calls-1-2-przqw before proceeding with build
  13.929 [id=194]	INFO	h.TcpSlaveAgentListener$ConnectionHandler#run: Connection #6 failed: java.io.EOFException
  14.047 [id=196]	INFO	h.TcpSlaveAgentListener$ConnectionHandler#run: Accepted JNLP4-connect connection #7 from /10.244.0.32:32956

Maybe 15 seconds here is a bit too short considering the 10 seconds delay between retries ? https://github.com/jenkinsci/remoting/blob/952d22bc5673950f53a949b49fbe1427d99324d9/src/main/java/hudson/remoting/Engine.java#L688

jglick · 2023-02-01T13:38:59Z

Maybe 15 seconds here is a bit too short

Probably; trying to make routine tests run in a reasonable amount of time. FYI jenkinsci/workflow-durable-task-step-plugin#284

jglick · 2023-02-27T17:08:48Z

FTR some other PRs related to pod deletion: #1095, #1227

Do not ask API server for a Pod during deletion unnecessarily

685fa8b

jglick requested a review from a team as a code owner January 31, 2023 23:16

jglick added the bug Bug Fixes label Jan 31, 2023

jdk11 branch may fail spuriously, so retry

cd43a3c

Vlatombe reviewed Feb 1, 2023

View reviewed changes

Vlatombe closed this Feb 1, 2023

Vlatombe reopened this Feb 1, 2023

Vlatombe approved these changes Feb 1, 2023

View reviewed changes

Vlatombe enabled auto-merge February 1, 2023 07:52

Vlatombe mentioned this pull request Feb 1, 2023

Added new retention policy "onJobFailure" #1265

Open

6 tasks

Vlatombe merged commit a982397 into jenkinsci:master Feb 1, 2023

jglick deleted the lazy-pod branch February 1, 2023 13:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not ask API server for a `Pod` during deletion unnecessarily #1304

Do not ask API server for a `Pod` during deletion unnecessarily #1304

jglick commented Jan 31, 2023

Vlatombe Feb 1, 2023 •

edited

Loading

jglick Feb 1, 2023

Vlatombe commented Feb 1, 2023

Vlatombe commented Feb 1, 2023 •

edited

Loading

jglick commented Feb 1, 2023

jglick commented Feb 27, 2023

Do not ask API server for a Pod during deletion unnecessarily #1304

Do not ask API server for a Pod during deletion unnecessarily #1304

Conversation

jglick commented Jan 31, 2023

Vlatombe Feb 1, 2023 • edited Loading

Choose a reason for hiding this comment

jglick Feb 1, 2023

Choose a reason for hiding this comment

Vlatombe commented Feb 1, 2023

Vlatombe commented Feb 1, 2023 • edited Loading

jglick commented Feb 1, 2023

jglick commented Feb 27, 2023

Do not ask API server for a `Pod` during deletion unnecessarily #1304

Do not ask API server for a `Pod` during deletion unnecessarily #1304

Vlatombe Feb 1, 2023 •

edited

Loading

Vlatombe commented Feb 1, 2023 •

edited

Loading