Skip to content

Commit

Permalink
rfc21: add offline job state
Browse files Browse the repository at this point in the history
  • Loading branch information
chu11 committed Dec 13, 2021
1 parent 5a0d59e commit b9550e8
Show file tree
Hide file tree
Showing 3 changed files with 130 additions and 89 deletions.
5 changes: 4 additions & 1 deletion data/spec_21/states.dot
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ digraph states {
DEPEND;
PRIORITY;
SCHED;
RUN;
{rank=same; RUN; OFFLINE;}
CLEANUP;
}

Expand All @@ -25,6 +25,9 @@ digraph states {

SCHED -> PRIORITY [label="flux-restart"]

RUN -> OFFLINE [xlabel="disconnect"]
OFFLINE -> RUN [xlabel="reconnect"]

edge [weight=0 color="red"];

DEPEND -> CLEANUP [label="exception"];
Expand Down
171 changes: 85 additions & 86 deletions data/spec_21/states.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
43 changes: 41 additions & 2 deletions spec_21.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,14 @@ RUN
job shells have been started, and a ``finish`` event once all the job shells
have exited. The state transitions to CLEANUP.

OFFLINE
The job was started, but the job manager has lost track of it due
to an error (for example, a system crash). The job manager is
attempting to reconnect itself to the running job. A ``disconnect``
event is logged to indicate transition into this state.
``reconnect`` will be logged when the tracking has been
reestablished and we can re-enter the RUN state.

CLEANUP
The job has completed or an exception has occurred. Under normal termination,
the job manager waits for notification from the exec service that job
Expand All @@ -133,10 +141,10 @@ PENDING
The job is in DEPEND, PRIORITY, or SCHED states.

RUNNING
The job is in RUN or CLEANUP states.
The job is in RUN, OFFLINE, or CLEANUP states.

ACTIVE
The job is in DEPEND, PRIORITY, SCHED, RUN, or CLEANUP states.
The job is in DEPEND, PRIORITY, SCHED, RUN, OFFLINE, or CLEANUP states.


Exceptions
Expand Down Expand Up @@ -391,6 +399,37 @@ status
{"timestamp":1552594348.0,"name":"epilog-finish","context":{"description":"/usr/sbin/job-epilog.sh", "status":0}}
Disconnect Event
^^^^^^^^^^^^^^^^

The job manager has lost tracking to a running job.

The following keys are OPTIONAL in the event context object:

id
(long long) job ID

Example:

.. code:: json
{"timestamp":1636747761.5495925,"name":"disconnect","context":{"id":341835776000}}
Reconnect Event
^^^^^^^^^^^^^^^

The job manager has reconnected to the job shells.

The context SHALL be empty.

Example:

.. code:: json
{"timestamp":1636747761.827836,"name":"reconnect"}
Free Event
^^^^^^^^^^

Expand Down

0 comments on commit b9550e8

Please sign in to comment.