Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: Add an 'event' operation for subscribing to pushes #508

Closed
wants to merge 5 commits into from

Commits on Jun 3, 2016

  1. runtime: Replace "process is stopped" with "process exits"

    proc(5) describes the following state entries in proc/[pid]/stat [1]
    (for modern kernels):
    
    * R Running
    * S Sleeping in an interruptible wait
    * D Waiting in uninterruptible disk sleep
    * Z Zombie
    * T Stopped (on a signal)
    * t Tracing stop
    * X Dead
    
    and ps(1) has a bit more context [2] (for modern kernels):
    
    * D uninterruptible sleep (usually IO)
    * R running or runnable (on run queue)
    * S interruptible sleep (waiting for an event to complete)
    * T stopped by job control signal
    * t stopped by debugger during the tracing
    * X dead (should never be seen)
    * Z defunct ("zombie") process, terminated but not reaped by its
      parent
    
    So I expect "stopped" to mean "process still exists but is paused,
    e.g. by SIGSTOP".  And I expect "exited" to mean "process has finished
    and is either a zombie or dead".
    
    After this commit, 'git grep -i stop' only turns up poststop-hook
    stuff, a reference in principles.md, a "stoppage" in LICENSE, and some
    ChangeLog entries.
    
    Also replace "container's process" with "container process" to match
    usage in the rest of the repository.  After this commit:
    
      $ git grep -i "container process" | wc -l
      16
      $ git grep -i "container's process" | wc -l
      1
    
    Also reword status entries to avoid "running", which is less precise
    in our spec (e.g. it also includes "sleeping", "waiting", ...).
    
    Also removes a "them" leftover from a partial plural -> singular
    reroll of be59415 (Split create and start, 2016-04-01, opencontainers#384).
    
    [1]: http://man7.org/linux/man-pages/man5/proc.5.html
    [2]: http://man7.org/linux/man-pages/man1/ps.1.html
    
    Signed-off-by: W. Trevor King <[email protected]>
    wking committed Jun 3, 2016
    Configuration menu
    Copy the full SHA
    5816f31 View commit details
    Browse the repository at this point in the history

Commits on Jun 23, 2016

  1. runtime: Add 'creating' to state status

    To distinguish between "we're still setting this container up" and
    "we're finished setting up; you can call 'start' if you like".
    
    Also reference the lifecycle steps, because you can't be too explicit
    
    Signed-off-by: W. Trevor King <[email protected]>
    wking committed Jun 23, 2016
    Configuration menu
    Copy the full SHA
    f0bbefb View commit details
    Browse the repository at this point in the history
  2. runtime: Only require 'pid' in the state for created/running statuses

    Because during the 'creating' phase we may not have a container
    process yet (e.g. if we're still reading the configuration or setting
    up cgroups), and in the 'stopped' phase the PID is no longer
    meaningful.
    
    Signed-off-by: W. Trevor King <[email protected]>
    wking committed Jun 23, 2016
    Configuration menu
    Copy the full SHA
    1a962c0 View commit details
    Browse the repository at this point in the history
  3. runtime: Add an 'event' operation for subscribing to pushes

    The current 'state' operation allows callers to poll for the state,
    but for some workflows polling is inefficient (how frequently do you
    poll to balance the cost of polling against the timeliness of the
    response?) and push notifications make more sense.
    
    The runtime's 'create' process is in a unique position to detect these
    status transitions.
    
    * As the actor carrying out container creation, it should have a clear
      idea of when that creation completes (for the 'created' event).
    
    * It may setup a communication channel with the container process to
      orchestrate creation, and that channel may be used to report the
      start event.
    
    * It knows (or is) the parent of the container process, and POSIX's
      wait(3), waitpid(3), and waitid(3) only work for child processes
      [1,2].  From [1]:
    
        Nothing in this volume of POSIX.1-2008 prevents an implementation
        from providing extensions that permit a process to get status from
        a grandchild or any other process, but a process that does not use
        such extensions must be guaranteed to see status from only its
        direct children.
    
      So the runtime can setup (or be) the parent waiting on the container
      process, and arrange for the 'stopped' event to be published on
      container exit.
    
    I've tried to phrase the requirements conservatively to allow for
    runtimes that have to poll their kernel or some such to notice these
    changes.  I see the following runtime-support cases:
    
    a. The runtime can easily supply a push-based event operation.  In
       this case, exposing that operation to callers faciliates push-based
       workflows without much cost.
    
    b. The runtime cannot supply a push-based event operation, and has to
       emulate it by polling.  In this case, the runtime can pick a
       polling strategy that makes sense to its maintainers, and callers
       who aren't satisfied with that strategy can roll their own state
       poller without a big efficiency hit (in a lesser of two evils way).
    
       The requirement is currently worded so weakly that a runtime would
       be compliant with:
    
       1. Container process dies at noon.
       2. User calls 'state' at 5pm.
       3. Runtime checks kernel, and sees that the container process is
          dead.
       4. Runtime publishes 'stopped' event with a 5pm (and some
          microseconds) timestamp.
       5. Runtime returns state to the user with 'stopped' in 'status'.
    
       which is a pretty low bar.
    
    c. The runtime could supply a push-based event operation, but it would
       be a lot of work.  These runtimes can use polling (b) as a
       quick-and-dirty solution until someone has time to implement a
       push-based solution (a).  The improvement is transparent to users,
       who can use the same event operation throughout and passively reap
       the benefits of the implementation improvements.
    
    Without an operation like this, higher levels that need to trigger on
    these transitions without polling need to exercise a lot of control
    over the system:
    
    * They must be the parent of (or on Linux, the nearest
      PR_SET_CHILD_SUBREAPER ancestor of [3]) the create process, and the
      create process needs to exit after creation completes, if they want
      to block on the 'created' event.
    * They must proxy all 'start' and 'delete' requests if they want to
      block on the 'started' or 'deleted' events.
    * On Linux, they must be the nearest ancestor of the create operation
      to set PR_SET_CHILD_SUBREAPER if they want to block on the 'stopped'
      event.
    
    By making 'event' a runtime requirement, we allow for efficient
    cross-platform push-based workflows while avoiding the need for tight
    orchestrator gate-keeping.  Runtimes that have implementation
    difficulties have an easy out that allows their callers to benefit
    from future implemenation improvements.  And callers that are not
    satisfied can always fall back to polling state or the
    proxy/waitid/PR_SET_CHILD_SUBREAPER approaches.
    
    [1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html
    [2]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitid.html
    [3]: http://man7.org/linux/man-pages/man2/prctl.2.html
    
    Signed-off-by: W. Trevor King <[email protected]>
    wking committed Jun 23, 2016
    Configuration menu
    Copy the full SHA
    54ae256 View commit details
    Browse the repository at this point in the history
  4. runtime: Support event buffering

    This is all very generic, and I expect more details to land in the
    runtime API specification.  Requirements around the buffer-request
    semantics (e.g. "buffer for 15 seconds" or "buffer the last 5 events"
    or "buffer the 'created' event") seemed out of place at the level of
    detail in this specification.
    
    The goal is to allow for:
    
      $ funC create --event-buffer created ID &
      $ funC event --event created ID && hook1 && hook2 && funC start ID
      $ fg
    
    To support blocking on the event without racing on "maybe the
    'created' event happened before the 'event' operation attached".
    
    Given the small number of required events, this buffering should not
    be a large resource concern, and '--event-buffer created' would only
    ever require a single event to be buffered per container.
    
    There is still a race on "maybe the container has already been
    destroyed and a second container has been created with the same ID
    before the 'event' operation attached", but that seems much less
    likely (especially since the caller is free to pick UUIDs).
    
    Signed-off-by: W. Trevor King <[email protected]>
    wking committed Jun 23, 2016
    Configuration menu
    Copy the full SHA
    fae7e99 View commit details
    Browse the repository at this point in the history