-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: future.is_ready and future.wait_for(0) always return False #2729
Comments
For the returned In this case if you change If you describe your use case, perhaps we can help find a good solution. |
Ah, OK. I thought it might be a bug. My use case is as follows: I want to execute M total job steps as quickly as possible, subject to the constraint that I can only have N of them out at any one time. Usually what I do is loop over my N existing job steps, and check whether they've finished. Something like the following:
|
Do you want to block waiting for any job to complete? If so, @cmoussa1 and @garlick put together a workflow example that matches your use-case pretty well. It uses the If you want to do the checks asynchronously, I don't think the wait interface example that I just referenced is what you want. |
Yeah, that example is straight from our testsuite: There are some other nice examples of the BTW, if you change the Also, as the example shows, you can wait for any waitable job if you leave off the Example derived from your use case (submit 20 jobs with 5 running at a time) Note the import time
import flux, flux.job
remaining = 20
running = 0
def check_completed_future(future):
try:
future.wait_for(1)
except EnvironmentError:
return False
else:
return True
def submit_job(h, jobspec):
global remaining
global running
flux.job.submit(h, jobspec, waitable=True)
remaining = remaining - 1
running = running + 1
h = flux.Flux()
jobspec = flux.job.JobspecV1.from_command(["sleep", "1"],
num_tasks=1,
cores_per_task=1)
# submit 5 jobs
for i in range(5):
submit_job(h, jobspec)
future = flux.job.wait_async(h)
while running:
if check_completed_future(future):
# future.get guaranteed not to block now:
jobid, success, msg = flux.job.wait_get_status(future)
print("job {} finished: {}".format(jobid, msg.decode('utf-8')))
running = running - 1
if remaining > 0:
submit_job(h, jobspec)
# get a new future
future = flux.job.wait_async(h)
print("done!") |
Unfortunately when I run the above test, it works, but I get the errors described in #2671
|
Thanks @grondo and @SteVwonder. I did want to wait asynchronously for each job to complete, but only because that fits with the existing infrastructure I've built around CLIs--where the only way I have to tell whether a job has completed is to poll the The |
I think you've moved on to the Python API which is faster and has more flexibility than using the Flux CLI, but I wanted to point out that #!/bin/bash
count=10
echo submitting
for i in $(seq 1 $count); do
flux mini submit --flags=waitable /bin/true
done
echo waiting
for i in $(seq 1 $count); do
flux job wait
done |
Actually, it sounds like we're going to continue using the CLI when our code runs in Python 2 (#2405), so that's actually pretty helpful, thanks! |
It seems that when I submit a job and then repeatedly try to check (without waiting) whether the job is done, it never completes. If I do wait, however, it does work.
Here's a little testing script I made. When I executed it, it ran indefinitely. The same happened if I replaced
while not check_completed_future(job_completion_future):
withwhile not job_completion_future.is_ready():
.Tracing the issue, I found that calls to
future.is_ready()
fall back tofuture_is_ready (flux_future_t *f)
infuture.c
. I'm guessing theresult_valid
attribute of the flux_future_t struct is set if I wait for a positive amount of time, but is never set if I don't wait. (I assumed it would set by some other thread, but I guess not.)The text was updated successfully, but these errors were encountered: