runtime: Add an 'event' operation for subscribing to pushes #508

proc(5) describes the following state entries in proc/[pid]/stat [1] (for modern kernels): * R Running * S Sleeping in an interruptible wait * D Waiting in uninterruptible disk sleep * Z Zombie * T Stopped (on a signal) * t Tracing stop * X Dead and ps(1) has a bit more context [2] (for modern kernels): * D uninterruptible sleep (usually IO) * R running or runnable (on run queue) * S interruptible sleep (waiting for an event to complete) * T stopped by job control signal * t stopped by debugger during the tracing * X dead (should never be seen) * Z defunct ("zombie") process, terminated but not reaped by its parent So I expect "stopped" to mean "process still exists but is paused, e.g. by SIGSTOP". And I expect "exited" to mean "process has finished and is either a zombie or dead". After this commit, 'git grep -i stop' only turns up poststop-hook stuff, a reference in principles.md, a "stoppage" in LICENSE, and some ChangeLog entries. Also replace "container's process" with "container process" to match usage in the rest of the repository. After this commit: $ git grep -i "container process" | wc -l 16 $ git grep -i "container's process" | wc -l 1 Also reword status entries to avoid "running", which is less precise in our spec (e.g. it also includes "sleeping", "waiting", ...). Also removes a "them" leftover from a partial plural -> singular reroll of be59415 (Split create and start, 2016-04-01, opencontainers#384). [1]: http://man7.org/linux/man-pages/man5/proc.5.html [2]: http://man7.org/linux/man-pages/man1/ps.1.html Signed-off-by: W. Trevor King <[email protected]>

To distinguish between "we're still setting this container up" and "we're finished setting up; you can call 'start' if you like". Also reference the lifecycle steps, because you can't be too explicit Signed-off-by: W. Trevor King <[email protected]>

Because during the 'creating' phase we may not have a container process yet (e.g. if we're still reading the configuration or setting up cgroups), and in the 'stopped' phase the PID is no longer meaningful. Signed-off-by: W. Trevor King <[email protected]>

The current 'state' operation allows callers to poll for the state, but for some workflows polling is inefficient (how frequently do you poll to balance the cost of polling against the timeliness of the response?) and push notifications make more sense. The runtime's 'create' process is in a unique position to detect these status transitions. * As the actor carrying out container creation, it should have a clear idea of when that creation completes (for the 'created' event). * It may setup a communication channel with the container process to orchestrate creation, and that channel may be used to report the start event. * It knows (or is) the parent of the container process, and POSIX's wait(3), waitpid(3), and waitid(3) only work for child processes [1,2]. From [1]: Nothing in this volume of POSIX.1-2008 prevents an implementation from providing extensions that permit a process to get status from a grandchild or any other process, but a process that does not use such extensions must be guaranteed to see status from only its direct children. So the runtime can setup (or be) the parent waiting on the container process, and arrange for the 'stopped' event to be published on container exit. I've tried to phrase the requirements conservatively to allow for runtimes that have to poll their kernel or some such to notice these changes. I see the following runtime-support cases: a. The runtime can easily supply a push-based event operation. In this case, exposing that operation to callers faciliates push-based workflows without much cost. b. The runtime cannot supply a push-based event operation, and has to emulate it by polling. In this case, the runtime can pick a polling strategy that makes sense to its maintainers, and callers who aren't satisfied with that strategy can roll their own state poller without a big efficiency hit (in a lesser of two evils way). The requirement is currently worded so weakly that a runtime would be compliant with: 1. Container process dies at noon. 2. User calls 'state' at 5pm. 3. Runtime checks kernel, and sees that the container process is dead. 4. Runtime publishes 'stopped' event with a 5pm (and some microseconds) timestamp. 5. Runtime returns state to the user with 'stopped' in 'status'. which is a pretty low bar. c. The runtime could supply a push-based event operation, but it would be a lot of work. These runtimes can use polling (b) as a quick-and-dirty solution until someone has time to implement a push-based solution (a). The improvement is transparent to users, who can use the same event operation throughout and passively reap the benefits of the implementation improvements. Without an operation like this, higher levels that need to trigger on these transitions without polling need to exercise a lot of control over the system: * They must be the parent of (or on Linux, the nearest PR_SET_CHILD_SUBREAPER ancestor of [3]) the create process, and the create process needs to exit after creation completes, if they want to block on the 'created' event. * They must proxy all 'start' and 'delete' requests if they want to block on the 'started' or 'deleted' events. * On Linux, they must be the nearest ancestor of the create operation to set PR_SET_CHILD_SUBREAPER if they want to block on the 'stopped' event. By making 'event' a runtime requirement, we allow for efficient cross-platform push-based workflows while avoiding the need for tight orchestrator gate-keeping. Runtimes that have implementation difficulties have an easy out that allows their callers to benefit from future implemenation improvements. And callers that are not satisfied can always fall back to polling state or the proxy/waitid/PR_SET_CHILD_SUBREAPER approaches. [1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html [2]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitid.html [3]: http://man7.org/linux/man-pages/man2/prctl.2.html Signed-off-by: W. Trevor King <[email protected]>

This is all very generic, and I expect more details to land in the runtime API specification. Requirements around the buffer-request semantics (e.g. "buffer for 15 seconds" or "buffer the last 5 events" or "buffer the 'created' event") seemed out of place at the level of detail in this specification. The goal is to allow for: $ funC create --event-buffer created ID & $ funC event --event created ID && hook1 && hook2 && funC start ID $ fg To support blocking on the event without racing on "maybe the 'created' event happened before the 'event' operation attached". Given the small number of required events, this buffering should not be a large resource concern, and '--event-buffer created' would only ever require a single event to be buffered per container. There is still a race on "maybe the container has already been destroyed and a second container has been created with the same ID before the 'event' operation attached", but that seems much less likely (especially since the caller is free to pick UUIDs). Signed-off-by: W. Trevor King <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: Add an 'event' operation for subscribing to pushes #508

runtime: Add an 'event' operation for subscribing to pushes #508

Commits on Jun 3, 2016

Commits on Jun 23, 2016