-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-63164] Clear CpsBodyExecution.thread when the body completes #367
Conversation
// may still be reachable (e.g. closures that outlive the body). If the CpsBodyExecution is still part | ||
// of the program, we need to make sure that we do not reference to a dead CpsThread because that may | ||
// cause resumption issues (JENKINS-63164). | ||
thread = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. Some other ideas I had:
- Update
CpsThreadGroup.run
so that when it removes a thread fromCpsThreadGroup.threads
it calls some new methodCpsThread.destroy
or similar that nulls out fields (mainlycontextVariables
) inCpsThread
in case something is still referencing it, and so that if theCpsThread
is still live then things work as they did before the patch. Might fix other issues as well that we aren't aware of. - Modify
groovy-cps
so that theEnv
captured byCpsClosure
does not reference things likeCallEnv.returnAddress
from nested environments. Not sure about correctness/feasibility of this, seems complicated. At least in the simple case here we only need the locals from the nested environments and if we only stored those variables there would be no issues, but perhaps there could be a references from the locals to some of the fields inCallEnv
, and maybe we need to maintain the actualEnv
references as-is for semantics.
If we stick with the current approach, some methods like CpsBodyExecution.isLaunched
might need to be updated to account for the fact that thread
can now go from null to non-null and then back to null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first option sounds better to me. Is there a downside to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first option sounds better to me. Is there a downside to that?
I'm not really sure, I can prototype it. If CpsBodyExecution.thread
is actually accessible from the program as a captured variable in the closure's environment, then nulling out all of the fields in CpsThread
could result in some weird behavior/NPEs, but I suspect that the CpsThread
is not actually accessible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting some other ideas that came up in a discussion today in case they are useful to someone in the future:
- Replacing
CpsBodyExecution.thread
withCpsBodyExecution.threadId
and only holding a transient reference to theCpsThread
itself, loading the value by looking up thethreadId
inCpsThreadGroup
. Not sure if this would be possible, and would require some adjustments to maintain serialization compatiblity. - Hooking into
RiverWriter
or similar code inworkflow-support
and just not serializingCpsThread
s that are not part of the rootCpsThreadGroup
. Seems complicated, and I'm not sure about compatibility.
SemaphoreStep.success("wait/1", null); | ||
r.assertBuildStatusSuccess(r.waitForCompletion(b)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird diff, but all I did here was removed the redundant call to SemaphoreStep.waitForStart
and the comment/code regarding "Before the fix for JENKINS-53709" since jenkinsci/workflow-durable-task-step-plugin#104 changed the behavior.
Incrementals publisher failed ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid I do not understand the issues well enough to comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to Jesse, I'm not sure how to evaluate this or #368. They both look reasonable, but maybe we could discuss a bit further.
Closing in favor of #368. |
See JENKINS-63164. Kind of reimplements #245 after #279 replaced it with a different approach that works better than #245 but only affects the
parallel
andload
steps.JENKINS-63164 is the root cause of
RestartingLoadStepTest.updatedBindingsOnRestart
being flaky, see #366.I am really not sure about the best way to fix the issue. See #367 (comment) for some discussion of other options.