-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bindings/python: destroy future after use #2553
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good and thanks for the PR!
We run python code through the "black" formatter in travis-ci and it's called out a need for this change:
--- src/bindings/python/flux/job.py 2019-11-22 10:18:42.824332 +0000
+++ src/bindings/python/flux/job.py 2019-11-22 10:18:56.523516 +0000
@@ -53,6 +53,5 @@
future = submit_async(flux_handle, jobspec, priority, flags)
jid = submit_get_id(future)
future.destroy()
return jid
-
Couple other minor nits if you don't mind:
- fix misspelling of "slowdowns" in commit message
- add "bindings/python:" prefix to commit title
For future reference, and for what it's worth, we put up these documents describing flux project practices:
85a2d42
to
ae6ea79
Compare
Awesome! Thanks! |
Following a suggestion by @grondo, the `submit_get_id`-future sees delayed / no garbage collection and causes a significant slowdown due to apparent limits in the future layer. This commit adds explicit destruction after future completion, thus avoiding that issue.
0d06d92
to
99687aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thanks again! I'll set the merge-when-passing label and then mergify will take care of the merge.
Uh oh, looks like one of the builders failed in
That runs a python script So we'd better run that down before this goes in (not putting that on you @andre-merzky) |
Out of my depth here, but a66ead4 relates to the lifetimes of futures in the python bindings. Maybe @SteVwonder will have some thoughts on this pr? Edit: fixed commit ref |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andre-merzky for posting this fix. Some inline comments below. For the one suggested change to work, the __clear
method in wrapper.py
will need to be renamed to _clear
(single underscore).
Using the _clear
method should avoid the memory corruption we are seeing in Travis since the _clear
method has some extra protections to ensure the C destructor is only called once.
def destroy(self): | ||
self.pimpl.destroy() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to not include this public method. I previously attempted to make this work, but exposing it publically opens up all sorts of edge-cases. I'd prefer to leverage the _clear
method in the Wrapper
class.
@@ -51,4 +51,6 @@ def submit_get_id(future): | |||
|
|||
def submit(flux_handle, jobspec, priority=lib.FLUX_JOB_PRIORITY_DEFAULT, flags=0): | |||
future = submit_async(flux_handle, jobspec, priority, flags) | |||
return submit_get_id(future) | |||
jid = submit_get_id(future) | |||
future.destroy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future.destroy() | |
future.pimpl._clear() |
Rename patch: diff --git a/src/bindings/python/flux/wrapper.py b/src/bindings/python/flux/wrapper.py
index 4a58e7f12..a74d090f5 100644
--- a/src/bindings/python/flux/wrapper.py
+++ b/src/bindings/python/flux/wrapper.py
@@ -326,7 +326,7 @@ class Wrapper(WrapperBase):
setattr(self, name, new_method)
return new_method
- def __clear(self):
+ def _clear(self):
# avoid recursion
if hasattr(self, "_handle") and self._handle is not None:
handle = self._handle
@@ -354,14 +354,14 @@ class Wrapper(WrapperBase):
)
)
if self._handle is not None:
- self.__clear()
+ self._clear()
self._handle = h
def __del__(self):
- self.__clear()
+ self._clear()
def __enter__(self):
return self
def __exit__(self, type_arg, value, unused):
- self.__clear()
+ self._clear() |
I gave @SteVwonder's suggested modification to this PR a try this morning and it seems to both get past the 500 job hang and pass Edit: although this pylint error pops up in travis:
|
Thanks @garlick for giving it a spin. Adding |
I think ultimately we need to fix the circular reference so that futures exposed to users aren’t leaked, but if this issue is blocking @andre-merzky, then I have no issue with this patch for now. |
Thanks for that! But I am still prototyping and quite a bit away from experiments or production, and don't mind waiting. I am using a version of this PR right now, so I am not stuck. |
Allow future.pimpl._clear() to be called from other parts of the binding by dropping an underscore from its prefix. The _clear method has some extra protections to ensure the C destructor is only called once, thus future.pimpl._clear() is the preferred way to explicitly dispose of a future. This is a prereq for explicit cleanup of futures in the job() class. Originally proposed in flux-framework#2553.
Problem: job.submit() and job.wait() both seem to leak futures. The futures in these "synchronous" methods go out of scope and thus should be automatically destroyed, but due to a circular reference alluded to by @SteVwonder in flux-framework#2549, they persist. As a workaround, explicitly call _clear() on the futures in these methods. Credit goes to @andre-merzky for proposing the first version of this patch in flux-framework#2553, based on a suggestion by @grondo, with changes proposed by @SteVwonder. Group effort! Fixes flux-framework#2549.
Allow future.pimpl._clear() to be called from other parts of the binding by dropping an underscore from its prefix. The _clear method has some extra protections to ensure the C destructor is only called once, thus future.pimpl._clear() is the preferred way to explicitly dispose of a future. This is a prereq for explicit cleanup of futures in the job() class. Originally proposed in flux-framework#2553.
Problem: job.submit() and job.wait() both seem to leak futures. The futures in these "synchronous" methods go out of scope and thus should be automatically destroyed, but due to a circular reference alluded to by @SteVwonder in flux-framework#2549, they persist. As a workaround, explicitly call _clear() on the futures in these methods. Credit goes to @andre-merzky for proposing the first version of this patch in flux-framework#2553, based on a suggestion by @grondo, with changes proposed by @SteVwonder. Group effort! Fixes flux-framework#2549.
#2563 just went in and included bits gathered from this PR, so closing this one. Thanks @andre-merzky for running this down and proposing the initial fix! |
Following a suggestion by @grondo, the
submit_get_id
-future seesdelayed / no garbage collection and causes a significant slowdowns due to
apparent limits in the future layer. This commit adds explicit
destruction after future completion, thus avoiding that issue.
I am not familiar with your PR policies: what branch to point to, what additional information to provide, etc. Please let me know this is off...
This closes #2549