[JENKINS-63164] Clear CpsBodyExecution.thread when the body completes #367

dwnusbaum · 2020-07-22T14:59:32Z

See JENKINS-63164. Kind of reimplements #245 after #279 replaced it with a different approach that works better than #245 but only affects the parallel and load steps.

JENKINS-63164 is the root cause of RestartingLoadStepTest.updatedBindingsOnRestart being flaky, see #366.

I am really not sure about the best way to fix the issue. See #367 (comment) for some discussion of other options.

dwnusbaum · 2020-07-22T15:14:29Z

src/main/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecution.java

+                // may still be reachable (e.g. closures that outlive the body). If the CpsBodyExecution is still part
+                // of the program, we need to make sure that we do not reference to a dead CpsThread because that may
+                // cause resumption issues (JENKINS-63164).
+                thread = null;


I'm not sure about this. Some other ideas I had:

Update CpsThreadGroup.run so that when it removes a thread from CpsThreadGroup.threads it calls some new method CpsThread.destroy or similar that nulls out fields (mainly contextVariables) in CpsThread in case something is still referencing it, and so that if the CpsThread is still live then things work as they did before the patch. Might fix other issues as well that we aren't aware of.

Modify groovy-cps so that the Env captured by CpsClosure does not reference things like CallEnv.returnAddress from nested environments. Not sure about correctness/feasibility of this, seems complicated. At least in the simple case here we only need the locals from the nested environments and if we only stored those variables there would be no issues, but perhaps there could be a references from the locals to some of the fields in CallEnv, and maybe we need to maintain the actual Env references as-is for semantics.

If we stick with the current approach, some methods like CpsBodyExecution.isLaunched might need to be updated to account for the fact that thread can now go from null to non-null and then back to null.

The first option sounds better to me. Is there a downside to that?

The first option sounds better to me. Is there a downside to that?

I'm not really sure, I can prototype it. If CpsBodyExecution.thread is actually accessible from the program as a captured variable in the closure's environment, then nulling out all of the fields in CpsThread could result in some weird behavior/NPEs, but I suspect that the CpsThread is not actually accessible.

Noting some other ideas that came up in a discussion today in case they are useful to someone in the future:

Replacing CpsBodyExecution.thread with CpsBodyExecution.threadId and only holding a transient reference to the CpsThread itself, loading the value by looking up the threadId in CpsThreadGroup. Not sure if this would be possible, and would require some adjustments to maintain serialization compatiblity.

Hooking into RiverWriter or similar code in workflow-support and just not serializing CpsThreads that are not part of the root CpsThreadGroup. Seems complicated, and I'm not sure about compatibility.

dwnusbaum · 2020-07-22T15:16:44Z

src/test/java/org/jenkinsci/plugins/workflow/cps/CpsBodyExecutionTest.java

+            SemaphoreStep.success("wait/1", null);
+            r.assertBuildStatusSuccess(r.waitForCompletion(b));


Weird diff, but all I did here was removed the redundant call to SemaphoreStep.waitForStart and the comment/code regarding "Before the fix for JENKINS-53709" since jenkinsci/workflow-durable-task-step-plugin#104 changed the behavior.

dwnusbaum · 2020-07-22T16:23:28Z

Incrementals publisher failed (InterruptedException in BourneShellScript.launchWithCookie) but the tests are passing so I am marking this as ready for review.

jglick

I am afraid I do not understand the issues well enough to comment.

bitwiseman

Similar to Jesse, I'm not sure how to evaluate this or #368. They both look reasonable, but maybe we could discuss a bit further.

dwnusbaum · 2020-07-23T17:50:34Z

Closing in favor of #368.

dwnusbaum added 2 commits July 22, 2020 10:54

[JENKINS-63164] Clear CpsBodyExecution.thread when the body completes

d82ee23

[JENKINS-63164] Rename variable in test case

2d16929

dwnusbaum commented Jul 22, 2020

View reviewed changes

dwnusbaum marked this pull request as ready for review July 22, 2020 16:23

dwnusbaum mentioned this pull request Jul 22, 2020

Try to diagnose why RestartingLoadStepTest.updatedBindingsOnRestart is flaky #366

Closed

dwnusbaum requested review from jglick, olamy, bitwiseman and car-roll July 22, 2020 16:28

jglick reviewed Jul 22, 2020

View reviewed changes

dwnusbaum mentioned this pull request Jul 22, 2020

[JENKINS-63164] Clear fields in CpsThread when thread is removed from CpsThreadGroup #368

Merged

bitwiseman closed this Jul 22, 2020

bitwiseman reopened this Jul 22, 2020

bitwiseman reviewed Jul 22, 2020

View reviewed changes

dwnusbaum closed this Jul 23, 2020

dwnusbaum deleted the JENKINS-63164 branch July 23, 2020 17:50

jglick mentioned this pull request Aug 24, 2024

Clear CpsBodyExecution.thread #925

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-63164] Clear CpsBodyExecution.thread when the body completes #367

[JENKINS-63164] Clear CpsBodyExecution.thread when the body completes #367

dwnusbaum commented Jul 22, 2020 •

edited

Loading

dwnusbaum Jul 22, 2020 •

edited

Loading

bitwiseman Jul 22, 2020

dwnusbaum Jul 22, 2020 •

edited

Loading

dwnusbaum Jul 23, 2020

dwnusbaum Jul 22, 2020

dwnusbaum commented Jul 22, 2020

jglick left a comment

bitwiseman left a comment •

edited

Loading

dwnusbaum commented Jul 23, 2020

		SemaphoreStep.success("wait/1", null);
		r.assertBuildStatusSuccess(r.waitForCompletion(b));

[JENKINS-63164] Clear CpsBodyExecution.thread when the body completes #367

[JENKINS-63164] Clear CpsBodyExecution.thread when the body completes #367

Conversation

dwnusbaum commented Jul 22, 2020 • edited Loading

dwnusbaum Jul 22, 2020 • edited Loading

Choose a reason for hiding this comment

bitwiseman Jul 22, 2020

Choose a reason for hiding this comment

dwnusbaum Jul 22, 2020 • edited Loading

Choose a reason for hiding this comment

dwnusbaum Jul 23, 2020

Choose a reason for hiding this comment

dwnusbaum Jul 22, 2020

Choose a reason for hiding this comment

dwnusbaum commented Jul 22, 2020

jglick left a comment

Choose a reason for hiding this comment

bitwiseman left a comment • edited Loading

Choose a reason for hiding this comment

dwnusbaum commented Jul 23, 2020

dwnusbaum commented Jul 22, 2020 •

edited

Loading

dwnusbaum Jul 22, 2020 •

edited

Loading

dwnusbaum Jul 22, 2020 •

edited

Loading

bitwiseman left a comment •

edited

Loading