Atomic job submit and job eventing subscription and etc #1534

dongahn · 2018-05-23T01:33:00Z

@grondo suggested to open an issue from #1332 (comment)

As I was helping @koning, we encountered two issues (or I should just say nuances) in his use of job.submit rpc.

He is currently using the JSC interface similar to this example code.

First thing he noticed was he wasn't able to easily pass the jobid of interest (only one job) at the time jsc.notify_status was called because the jobid comes back with the response from the job.submit.

As @grondo suggested, this can also be worked around if we used job.create first and then job.submit-nocreate. But for simplicity, he currently stores the jobid as a class member variable of a Python class which happens before reactor_run () and thus the first invocation of his job status callback.

The second thing: his use case is such that the job wrapper only needs to monitor the state changes of a single job. But this JSC method invokes the job status call back for each state change of all of the jobs, which then would perturb job executions.

We could hide this by exposing richer abstraction through JSC, but I was wondering if we should actually consider the changing the topic string for a job event to include jobid so that one can subscribed to a state event for a specific job.

These are certainly not high priority. But I'm discussion those so that we can factor them into our new execution system design.

The text was updated successfully, but these errors were encountered:

garlick · 2018-05-23T02:06:55Z

(thinking of the new interface) what about a RPC interface for submit where you can optionally get multiple responses, one for each state transition, until a terminal state is reached?

he currently stores the jobid as a class member variable of a Python class which happens before reactor_run () and thus the first invocation of his job status callback.

When you subscribe to events but delay entering the reactor, those events are not lost. They are queued in the flux_t handle. I think as long as you subscribe, then submit the job, then enter reactor and filter events based on the job id, you should pick up all the state transitions.

The downside is you get a lot of events that you must ignore, but maybe OK for now?

garlick · 2018-05-23T14:09:57Z

what about a RPC interface for submit where you can optionally get multiple responses, one for each state transition, until a terminal state is reached?

Or maybe a better interface for this use case would be to mimic fork/exec(2) and wait(2): one API to submit job and obtain job id, and another to block until jobid reaches terminal state then return exit code(s) or other errors. Do users need the to track the intermediate states?

grondo · 2018-05-23T14:52:38Z

Or maybe a better interface for this use case would be to mimic fork/exec(2) and wait(2)

I kind of like this idea -- users or tools may also want to track a stopped state when starting a job for debugger attach, but wait(2) also traditionally handles that.

As a counterpoint, it does seem like restricting the job state(s) you can reliably watch for is unnecessarily limiting. However, I'm not sure if you were proposing that we limit the job states for which users could watch, or just that we provide a high level wait(2)-style function that runs a callback on a subset of job states (complete, stopped)? (but still allow low-level calls to watch any job state)

garlick · 2018-05-23T15:04:52Z

I guess no reason to limit the states that can be waited for.

I should also note that, if implemented as proposed in RFC 16, status in the KVS will be a log so calling wait on a particular state that has already been passed through can be reliably handled without races.

There is a list of states in RFC 16 that probably needs to be amended too. (No stopped state for example).

garlick · 2018-05-23T15:18:12Z

However, I'm not sure if you were proposing that we limit the job states for which users could watch, or just that we provide a high level wait(2)-style function that runs a callback on a subset of job states (complete, stopped)?

I was thinking of the wait function as a regular RPC that blocks until a particular state (or one of a set of states) is reached, and then is fulfilled. Rather than implementing a new callback, it would return a future and offer a "get" function for retrieving the state and any other information associated with the state transition (like exit code or other failure).

grondo · 2018-05-23T15:19:42Z

I should also note that, if implemented as proposed in RFC 16, status in the KVS will be a log so calling wait on a particular state that has already been passed through can be reliably handled without races.

Yeah, I meant to say that above, we might be trying to fix a problem we'll no longer have after replacing the current system. If the low level calls to watch for job states work reliably, then we could always build higher level functionality from these low level calls. (Like the submit-and-wait call you propose could be built from the submit() API coupled with wait-for-complete)

One property of wait(2) we'd want to avoid is that only the parent can wait for a child. In this system multiple actors will need to be able to wait on job states and possibly even collect exit status.

There is a list of states in RFC 16 that probably needs to be amended too. (No stopped state for example).

Is the KVS job schema RFC the right place to enumerate all job states?

This could be a bad idea, but I was wondering if we wanted to allow job states to be extensible anyway. For example, a different implementation of a job shell might have more or less states than what we propose in an RFC. Of course there will need to be a list of required states, but allowing providers to offer new states might offer some flexibility or experimentation.

grondo · 2018-05-23T15:28:32Z

I was thinking of the wait function as a regular RPC that blocks until a particular state (or one of a set of states) is reached, and then is fulfilled. Rather than implementing a new callback, it would return a future and offer a "get" function for retrieving the state and any other information associated with the state transition (like exit code or other failure).

Yes, that seems straightforward and was what I assumed. I was more asking if you were proposing this was the only way to wait on job events, and if the function was limited to only the "complete" state as in the wait system call.

Returning "other information associated with state transition" seems like it might be tricky, but I can see how it might be necessary in order to avoid other races.

dongahn · 2018-05-23T15:36:25Z

micmicing fork/exec wait is a good idea. We need to carefully put good mechanisms and APIs to allow for avoiding races, though, in particular when multiple users are allowed to use wait.

We need to think about whether wait should be limited to a single job or a groupd of jobs as well. I like the same or related interfaces used for single and multiple. Unix Wait also has variants like waitpid or wait for all decendent pids.

grondo · 2018-05-23T15:56:37Z

If the future returned by the wait RPC has a wait_get_jobid or the jobid is returned in the payload and can be fetched with unpack(), then it will be simple to call wait on multiple jobids and use the same continuation callback.

Also, it might be really useful to have filtering done on the job-manager side (presuming this is the service that responds to wait RPCs), so that a wait RPC can be initiated with not only jobid filtering, but other filters like username.

garlick · 2018-05-23T15:58:07Z

Excellent comments!

Do we need an abstraction something like "process groups" to allow waiting/signaling groups of jobs?

grondo · 2018-05-23T16:53:00Z

Do we need an abstraction something like "process groups" to allow waiting/signaling groups of jobs?

@dongahn had mentioned that same thought before, and my initial reaction was to use a sub-instance if you need to treat a group of jobs as an ensemble. However, after having time to think about it, there may be use cases that can't be handled this way (for example, backfilling a series of small jobs treated as a group might be more efficient than trying to run the jobs in a sub-instance) Therefore, I do think adding process groups is a great idea.

grondo · 2019-02-13T20:10:46Z

closed by #1988

grondo closed this as completed Feb 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomic job submit and job eventing subscription and etc #1534

Atomic job submit and job eventing subscription and etc #1534

dongahn commented May 23, 2018

garlick commented May 23, 2018

garlick commented May 23, 2018 •

edited

Loading

grondo commented May 23, 2018 •

edited

Loading

garlick commented May 23, 2018

garlick commented May 23, 2018

grondo commented May 23, 2018

grondo commented May 23, 2018

dongahn commented May 23, 2018

grondo commented May 23, 2018

garlick commented May 23, 2018

grondo commented May 23, 2018

grondo commented Feb 13, 2019

Atomic job submit and job eventing subscription and etc #1534

Atomic job submit and job eventing subscription and etc #1534

Comments

dongahn commented May 23, 2018

garlick commented May 23, 2018

garlick commented May 23, 2018 • edited Loading

grondo commented May 23, 2018 • edited Loading

garlick commented May 23, 2018

garlick commented May 23, 2018

grondo commented May 23, 2018

grondo commented May 23, 2018

dongahn commented May 23, 2018

grondo commented May 23, 2018

garlick commented May 23, 2018

grondo commented May 23, 2018

grondo commented Feb 13, 2019

garlick commented May 23, 2018 •

edited

Loading

grondo commented May 23, 2018 •

edited

Loading