-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch subprocess.TimeoutError
once
#196
Conversation
This was the root cause of several jobs with the same features being submitted, even though they had already been submitted
For reference, related is the recent change in PR #191 on how to handle |
I think you mean |
I already asked in #191 whether catching |
As I wrote before, I wonder, why do we actually do this change? Why not keep the original behavior, i.e. just not handle |
Thanks for the comments Albert. I didn't see @michelwi's and your comments. Indeed, this was the source of an issue. I see that the return value is correctly handled below. I will therefore leave the code as it was, catching the |
I.e. you update this PR here, to remove the |
Yes! I did that just now :) |
This was the root cause of several jobs with the same features being submitted, even though they had already been submitted:
gateway
option and there's aTimeoutError
here.ssh gw-02 squeue
finishes with error -1. Sisyphus then reaches here safely. It hasn't crashed with aTimeoutError
because we haven't propagated it.TimeoutError
is raised after callingself.system_call()
because it's already been addressed insideself.system_call()
and it hasn't been propagated. Therefore, the process ignores theexcept
block here. Ifretval == -1
, sisyphus doesn't care either!In my view, the problem comes from not addressing errors that could have been obtained from the return values. Therefore, this PR correctly addresses return codes different from zero, and leaves the work of addressing the exceptions of the subprocesses to
self.system_call()
.