-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: Add an 'event' operation for subscribing to pushes #508
Conversation
proc(5) describes the following state entries in proc/[pid]/stat [1] (for modern kernels): * R Running * S Sleeping in an interruptible wait * D Waiting in uninterruptible disk sleep * Z Zombie * T Stopped (on a signal) * t Tracing stop * X Dead and ps(1) has a bit more context [2] (for modern kernels): * D uninterruptible sleep (usually IO) * R running or runnable (on run queue) * S interruptible sleep (waiting for an event to complete) * T stopped by job control signal * t stopped by debugger during the tracing * X dead (should never be seen) * Z defunct ("zombie") process, terminated but not reaped by its parent So I expect "stopped" to mean "process still exists but is paused, e.g. by SIGSTOP". And I expect "exited" to mean "process has finished and is either a zombie or dead". After this commit, 'git grep -i stop' only turns up poststop-hook stuff, a reference in principles.md, a "stoppage" in LICENSE, and some ChangeLog entries. Also replace "container's process" with "container process" to match usage in the rest of the repository. After this commit: $ git grep -i "container process" | wc -l 16 $ git grep -i "container's process" | wc -l 1 Also reword status entries to avoid "running", which is less precise in our spec (e.g. it also includes "sleeping", "waiting", ...). Also removes a "them" leftover from a partial plural -> singular reroll of be59415 (Split create and start, 2016-04-01, opencontainers#384). [1]: http://man7.org/linux/man-pages/man5/proc.5.html [2]: http://man7.org/linux/man-pages/man1/ps.1.html Signed-off-by: W. Trevor King <[email protected]>
To distinguish between "we're still setting this container up" and "we're finished setting up; you can call 'start' if you like". Also reference the lifecycle steps, because you can't be too explicit Signed-off-by: W. Trevor King <[email protected]>
Because during the 'creating' phase we may not have a container process yet (e.g. if we're still reading the configuration or setting up cgroups), and in the 'stopped' phase the PID is no longer meaningful. Signed-off-by: W. Trevor King <[email protected]>
The current 'state' operation allows callers to poll for the state, but for some workflows polling is inefficient (how frequently do you poll to balance the cost of polling against the timeliness of the response?) and push notifications make more sense. The runtime's 'create' process is in a unique position to detect these status transitions. * As the actor carrying out container creation, it should have a clear idea of when that creation completes (for the 'created' event). * It may setup a communication channel with the container process to orchestrate creation, and that channel may be used to report the start event. * It knows (or is) the parent of the container process, and POSIX's wait(3), waitpid(3), and waitid(3) only work for child processes [1,2]. From [1]: Nothing in this volume of POSIX.1-2008 prevents an implementation from providing extensions that permit a process to get status from a grandchild or any other process, but a process that does not use such extensions must be guaranteed to see status from only its direct children. So the runtime can setup (or be) the parent waiting on the container process, and arrange for the 'stopped' event to be published on container exit. I've tried to phrase the requirements conservatively to allow for runtimes that have to poll their kernel or some such to notice these changes. I see the following runtime-support cases: a. The runtime can easily supply a push-based event operation. In this case, exposing that operation to callers faciliates push-based workflows without much cost. b. The runtime cannot supply a push-based event operation, and has to emulate it by polling. In this case, the runtime can pick a polling strategy that makes sense to its maintainers, and callers who aren't satisfied with that strategy can roll their own state poller without a big efficiency hit (in a lesser of two evils way). The requirement is currently worded so weakly that a runtime would be compliant with: 1. Container process dies at noon. 2. User calls 'state' at 5pm. 3. Runtime checks kernel, and sees that the container process is dead. 4. Runtime publishes 'stopped' event with a 5pm (and some microseconds) timestamp. 5. Runtime returns state to the user with 'stopped' in 'status'. which is a pretty low bar. c. The runtime could supply a push-based event operation, but it would be a lot of work. These runtimes can use polling (b) as a quick-and-dirty solution until someone has time to implement a push-based solution (a). The improvement is transparent to users, who can use the same event operation throughout and passively reap the benefits of the implementation improvements. Without an operation like this, higher levels that need to trigger on these transitions without polling need to exercise a lot of control over the system: * They must be the parent of (or on Linux, the nearest PR_SET_CHILD_SUBREAPER ancestor of [3]) the create process, and the create process needs to exit after creation completes, if they want to block on the 'created' event. * They must proxy all 'start' and 'delete' requests if they want to block on the 'started' or 'deleted' events. * On Linux, they must be the nearest ancestor of the create operation to set PR_SET_CHILD_SUBREAPER if they want to block on the 'stopped' event. By making 'event' a runtime requirement, we allow for efficient cross-platform push-based workflows while avoiding the need for tight orchestrator gate-keeping. Runtimes that have implementation difficulties have an easy out that allows their callers to benefit from future implemenation improvements. And callers that are not satisfied can always fall back to polling state or the proxy/waitid/PR_SET_CHILD_SUBREAPER approaches. [1]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html [2]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitid.html [3]: http://man7.org/linux/man-pages/man2/prctl.2.html Signed-off-by: W. Trevor King <[email protected]>
This is all very generic, and I expect more details to land in the runtime API specification. Requirements around the buffer-request semantics (e.g. "buffer for 15 seconds" or "buffer the last 5 events" or "buffer the 'created' event") seemed out of place at the level of detail in this specification. The goal is to allow for: $ funC create --event-buffer created ID & $ funC event --event created ID && hook1 && hook2 && funC start ID $ fg To support blocking on the event without racing on "maybe the 'created' event happened before the 'event' operation attached". Given the small number of required events, this buffering should not be a large resource concern, and '--event-buffer created' would only ever require a single event to be buffered per container. There is still a race on "maybe the container has already been destroyed and a second container has been created with the same ID before the 'event' operation attached", but that seems much less likely (especially since the caller is free to pick UUIDs). Signed-off-by: W. Trevor King <[email protected]>
I don't like this. For runC, it would mean that we'd have to make a long-running daemon to implement this functionality. IMO that defeats the whole point of runC -- it's a single binary you can use without needing any setup. If we make it so that runC doesn't exit after starting a container, then I'm still iffy about it. |
On Mon, Jul 11, 2016 at 07:12:09PM -0700, Aleksa Sarai wrote:
Why is that? In ccon, where I'm happy with “the start-socket exists” event-handling-process where each parent outlives its child. But runtimes that value process event-handling host process If you don't want extra server-side stuff for runC, you could put And if you want to invest the minimum dev time, you can always fall
I agree that a system without this ‘event’ operation is lighter |
I'm good with the idea of an event model.. but agree we should keep tools like runc lower in the stack. For example, runC is not an orchestrator its a low level component. Thus I think the idea has been to keep this runtime spec at said lower level. Just thinking out loud.. can't we map the event notifications to Mrunal's hooks and store event subscriptions as annotations? IOW maybe the spec discussion should avoid describing how the implementation of the event model should be provided. |
On Tue, Jul 12, 2016 at 07:05:15AM -0700, Mike Brown wrote:
Can you unpack “event registrations”? Do you just mean “the event "annotations": { I like the state subdir approach better 1, since you don't have to
I'm not sure where you're going with “should be provided”. I think But if this PR lands, I think the API spec (#511 / #513) should |
/s/event registrations/event subscriptions/ |
+1 to what @cyphar said on why we shouldn't do this |
Catch up with be59415 (Split create and start, 2016-04-01, opencontainers#384). I'm not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. I have a proposal for an 'event' operation [1] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [2]. The "SHOULD exit as quickly as possible after ..." wording is going to be hard to enforce (which is why it's not MUST), but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. The 'create' ID argument is required (while the old 'start' ID argument was optional) because the runC approach requires an explicit 'delete' call to cleanup the container. With a long-running 'create' that could automatically cleanup, you could have an optional ID to support users that don't care what ID is used (because they know they won't be calling 'state', 'start', etc.). One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. The ptrace idea is from Mrunal [3]. [1]: opencontainers#508 [2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
On Tue, Jul 12, 2016 at 01:02:21PM -0700, Michael Crosby wrote:
Responding to @cyphar, I pushed back on this requiring a long-running Does that address your concerns about long-running processes? If not, |
Catch up with be59415 (Split create and start, 2016-04-01, opencontainers#384). I'm not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. I have a proposal for an 'event' operation [1] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [2]. The "SHOULD exit as quickly as possible after ..." wording is going to be hard to enforce (which is why it's not MUST), but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. The 'create' ID argument is required (while the old 'start' ID argument was optional) because the runC approach requires an explicit 'delete' call to cleanup the container. With a long-running 'create' that could automatically cleanup, you could have an optional ID to support users that don't care what ID is used (because they know they won't be calling 'state', 'start', etc.). One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. The ptrace idea is from Mrunal [3]. [1]: opencontainers#508 [2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [3]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 Signed-off-by: W. Trevor King <[email protected]>
Along the lines of the create/start split we've been discussing for the OCI [1,2]. Pull some common functionality into a libccon, although I don't intend to make that a more than an internal helper. Post-error cleanup in ccon-cli is pretty slim, since the process is just about to die anyway. I'll probably go back through and add proper cleanup later. serve_socket gets too deeply nested for my taste, but it's not quite to the point where I'd pull out the request handling into a handle_start_request helper or some such. Mostly because I'd have to either setup a new buf/iov/msg in the helper's stack or pass that all through from serve_socket ;). A few notes on the tests: I don't have a stand-alone 'wait' on my system (it's built into most shells [3]), but I've added an explicit check for it because POSIX doesn't require it to be built in. The waiting in the create/start tests is a bit awkward, but here's the reasoning for the various steps: * Put the socket in a 'sock' subdirectory so we don't have to mess with inotifywait's --exclude (when what we really want is an --include). In most cases, the socket would be the first thing created in the test directory, but the process.host test will create a pivot-root.* before creating the socket. * Clear a 'wait' file before launching the inotifywait/start subshell and append to it from that subshell so grep has something to look at (even if it racily starts looking before the subshell processes the inotifywait line). * Block on a busy-wait grep until inotifywait sets up its watcher. This ensures we don't call ccon and create the socket before the watcher is looking. The zero-second sleep while we wait for inotifywait to setup its watcher is busy, but POSIX doesn't require 'sleep' to support non-integer times [4]. All of these issues could be abstracted out into an 'event' command [5], but they're fine for a proof-of-concept create/start split. [1]: https://groups.google.com/a/opencontainers.org/d/msg/dev/qWHoKs8Fsrk/k55FQrBzBgAJ Subject: Re: Splitting Start into create() and run() Date: Thu, 24 Mar 2016 15:04:14 -0700 Message-ID: <[email protected]> [2]: opencontainers/runtime-spec#384 Subject: Split create and start [3]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tag_17_06 [4]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sleep.html [5]: opencontainers/runtime-spec#508 Subject: runtime: Add an 'event' operation for subscribing to pushes
On Sat, Jul 16, 2016 at 04:03:17PM -0700, W. Trevor King wrote:
And to make the OCI integration more concrete too, I have a ccon $ echo 'echo hi; exit' | ccon-oci create --id container-1 >output & If you use a second terminal for the event/start call, you can skip term1 $ ccon-oci create --id container-1 term2 $ ccon-oci event -He created container-1 && ccon-oci start container-1 …create unblocks on term1… That seems pretty clean to me, and doesn't require much implementation |
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Hopefully-Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Partially-Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
Today I stumbled across some higher-level work discussion polling
vs. subscription: kubernetes/kubernetes#8756,
kubernetes/kubernetes#16831.
|
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was recieved and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Versioning The command-line interface is largely orthogonal to the config format, and config authors and runtime callers may be entirely different sets of people. Zhang Wei called for more explicit versioning for the CLI [interface-versioning], and the approach taken here follows the approach taken by Python's email package [python-email-version]. Wedging multiple, independently versioned entities into a single repository can be awkward, but earlier proposals to put the CLI in its own repository [separate-repository-proposed] were unsuccessful because compliance testing requires both a CLI and a config specification [separate-repository-refused]. Trevor doesn't think that's a solid reason [separate-repository-refusal-rebutted], but discussion along that line stalled out, so the approach taken here is to keep both independently versioned entities in the same repository. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [interface-versioning]: opencontainers#513 (comment) [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [python-email-version]: https://docs.python.org/3/library/email.html#package-history [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [separate-repository-proposed]: opencontainers#513 (comment) [separate-repository-refused]: opencontainers#513 (comment) [separate-repository-refusal-rebutted]: opencontainers#513 (comment) [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Versioning The command-line interface is largely orthogonal to the config format, and config authors and runtime callers may be entirely different sets of people. Zhang Wei called for more explicit versioning for the CLI [interface-versioning], and the approach taken here follows the approach taken by Python's email package [python-email-version]. Wedging multiple, independently versioned entities into a single repository can be awkward, but earlier proposals to put the CLI in its own repository [separate-repository-proposed] were unsuccessful because compliance testing requires both a CLI and a config specification [separate-repository-refused]. Trevor doesn't think that's a solid reason [separate-repository-refusal-rebutted], but discussion along that line stalled out, so the approach taken here is to keep both independently versioned entities in the same repository. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [interface-versioning]: opencontainers#513 (comment) [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [python-email-version]: https://docs.python.org/3/library/email.html#package-history [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [separate-repository-proposed]: opencontainers#513 (comment) [separate-repository-refused]: opencontainers#513 (comment) [separate-repository-refusal-rebutted]: opencontainers#513 (comment) [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Versioning The command-line interface is largely orthogonal to the config format, and config authors and runtime callers may be entirely different sets of people. Zhang Wei called for more explicit versioning for the CLI [interface-versioning], and the approach taken here follows the approach taken by Python's email package [python-email-version]. Wedging multiple, independently versioned entities into a single repository can be awkward, but earlier proposals to put the CLI in its own repository [separate-repository-proposed] were unsuccessful because compliance testing requires both a CLI and a config specification [separate-repository-refused]. Trevor doesn't think that's a solid reason [separate-repository-refusal-rebutted], but discussion along that line stalled out, so the approach taken here is to keep both independently versioned entities in the same repository. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [interface-versioning]: opencontainers#513 (comment) [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [python-email-version]: https://docs.python.org/3/library/email.html#package-history [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [separate-repository-proposed]: opencontainers#513 (comment) [separate-repository-refused]: opencontainers#513 (comment) [separate-repository-refusal-rebutted]: opencontainers#513 (comment) [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
This should be post-1.0 |
I agree, I'm tagging it to v1.1.0. |
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Versioning The command-line interface is largely orthogonal to the config format, and config authors and runtime callers may be entirely different sets of people. Zhang Wei called for more explicit versioning for the CLI [interface-versioning], and the approach taken here follows the approach taken by Python's email package [python-email-version]. Wedging multiple, independently versioned entities into a single repository can be awkward, but earlier proposals to put the CLI in its own repository [separate-repository-proposed] were unsuccessful because compliance testing requires both a CLI and a config specification [separate-repository-refused]. Trevor doesn't think that's a solid reason [separate-repository-refusal-rebutted], but discussion along that line stalled out, so the approach taken here is to keep both independently versioned entities in the same repository. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [interface-versioning]: opencontainers#513 (comment) [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [python-email-version]: https://docs.python.org/3/library/email.html#package-history [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [separate-repository-proposed]: opencontainers#513 (comment) [separate-repository-refused]: opencontainers#513 (comment) [separate-repository-refusal-rebutted]: opencontainers#513 (comment) [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
REJECT Reading over this again I'm putting in my formal reject for this. It is totally about implementation and if something wants to do this it can but it has no place specifying this because it places too many implantation constrains on the runtimes and is basically telling runtime devs that they need daemons or other fsnotify logic to build an events system. Keep the spec small and useful and build higher level things on top or else we will just have a bloated spec that it unusable. |
You can also implement this by polling state (approach 3 in my initial post). And the runtime portion doesn't have to use inotify (it just writes events to a directory). The inotify watcher can be a completely separate project.
But as the spec stands there is no portable way to collect the container's exit code. We need something like this PR as a base for that; you can't build it into a higher level. Of course, it could be part of a parallel spec, in which case folks who wanted a portable way to collect the container's exit code would look for both runtime-spec compliance and event-operation compliance, but I think we should have a clearer reason for splitting those apart when the bar is as low as “polling state” or “writing events to a directory on a platform that supports inotify”. |
@wking Can you explain how this race can be handled with nothing but a directory full of files?
As far as I can tell, there's no nice way of removing events from the head of the event queue without having an actual queue structure (which the VFS is not). And reading/writing to a single file is not a good idea. The only solution I could come up with is basically having a single-file database, which is way overkill... |
On Thu, Feb 02, 2017 at 07:15:59PM -0800, Aleksa Sarai wrote:
@wking Can you explain how this race can be handled with nothing but
a directory full of files?
> To support blocking on the event without racing on “maybe the
> created event happened before the event operation attached”.
Setup your inotify watcher and then read out the contents of the
directory (shell implementation here [1]).
[1]: https://github.com/wking/inotify-pubsub/blob/83fd18ddd88979e43f256182ffe70314cde9180b/inotify-sub#L104-L128
|
From my post back in July.. "Just thinking out loud.. can't we map the event notifications to Mrunal's hooks and store event subscriptions as annotations?" @wking given Michael's strong Reject, what do you think of turning this idea into a code sample for using the hooks to show how runc can be extended, tested, monitored...? |
Hooks won't let you But if you see a way to handle this with hooks, please file it as an alternative PR. I just don't see it at the moment. |
@wking does that mean you don't want to make this a code sample? The inability to collect exit codes as referenced by your opencontainers/runtime-tools#305 should be addressed. @mrunalp are we missing a hook here? If we're going to have hooks to cover the lifecycle of containers that would be an obvious one one to have. But that's separate from should we specify these events in the runtime spec. |
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Versioning The command-line interface is largely orthogonal to the config format, and config authors and runtime callers may be entirely different sets of people. Zhang Wei called for more explicit versioning for the CLI [interface-versioning], and the approach taken here follows the approach taken by Python's email package [python-email-version]. Wedging multiple, independently versioned entities into a single repository can be awkward, but earlier proposals to put the CLI in its own repository [separate-repository-proposed] were unsuccessful because compliance testing requires both a CLI and a config specification [separate-repository-refused]. Trevor doesn't think that's a solid reason [separate-repository-refusal-rebutted], but discussion along that line stalled out, so the approach taken here is to keep both independently versioned entities in the same repository. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [interface-versioning]: opencontainers#513 (comment) [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [python-email-version]: https://docs.python.org/3/library/email.html#package-history [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [separate-repository-proposed]: opencontainers#513 (comment) [separate-repository-refused]: opencontainers#513 (comment) [separate-repository-refusal-rebutted]: opencontainers#513 (comment) [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
Catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers/runtime-spec#384). One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). I still likes the long-running 'create' API because it makes collecting the exit code easier. I've proposed an 'event' operation [1] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after the 2016-07-13 meeting was to table that while we land docs for the runC API [2], and runC has an early-exit create [3]. The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. The ptrace idea in this commit is from Mrunal [4]. [1]: opencontainers/runtime-spec#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [3]: opencontainers/runc#827 Summary: Implement create and start [4]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 Signed-off-by: W. Trevor King <[email protected]>
Catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers/runtime-spec#384). One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). I still likes the long-running 'create' API because it makes collecting the exit code easier. I've proposed an 'event' operation [1] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after the 2016-07-13 meeting was to table that while we land docs for the runC API [2], and runC has an early-exit create [3]. The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. The ptrace idea in this commit is from Mrunal [4]. [1]: opencontainers/runtime-spec#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [3]: opencontainers/runc#827 Summary: Implement create and start [4]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 Signed-off-by: W. Trevor King <[email protected]>
folks are generally in agreement that this is not needed in at the runtime layer, but would be a fine feature to happen higher in the stack |
On Wed, Mar 08, 2017 at 02:38:51PM -0800, Vincent Batts wrote:
folks are generally in agreement that this is not needed in at the
runtime layer, but would be a fine feature to happen higher in the
stack
The problem is that, short of polling state, there's no way to add
this higher up the stack. For more on why, see “The runtime's create
process is in a unique position to detect these status transitions” in
[1].
If you have some insider knowledge about the runtime implementation
(e.g. you know the container process will be an ancestor of the
runtime ‘create’ process and you are on Linux) the ‘create’ caller can
reach around the runtime by setting PR_SET_CHILD_SUBREAPER or similar
(see “Without an operation like this, higher levels that need to
trigger on these transitions without polling need to exercise a lot of
control over the system” in [1]).
And again, as I pointed out in the “I see the following
runtime-support cases” section of [1], runtimes can always supply this
operation by polling state themselves if that's really the best they
can do. And polling state clearly needs no daemons or fsnotify logic
which @crosbymichael seems strongly against [2].
So I'm really not seeing where this is such an issue, and I think it's
unfortunate that the OCI has no plans to provide a portable way to
collect the container's exit code or block on container process exit.
In the meantime, I expect runtime-tools to continue avoiding the
create/start split [3] opencontainers/runtime-tools#305. But I guess
we'll see if/when we actually start compliance-testing runtimes…
[1]: #508 (comment)
[2]: #508 (comment)
[3]: https://github.com/opencontainers/runtime-tools/blob/8392e59c7c11b146462e20701c912242f83cf110/cmd/oci-runtime-tool/runtime_validate.go#L74
|
And this is what containerd does, with subreaper set here, the |
Along the lines of the create/start split we've been discussing for the OCI [1,2]. Pull some common functionality into a libccon, although I don't intend to make that a more than an internal helper. Post-error cleanup in ccon-cli is pretty slim, since the process is just about to die anyway. I'll probably go back through and add proper cleanup later. serve_socket gets too deeply nested for my taste, but it's not quite to the point where I'd pull out the request handling into a handle_start_request helper or some such. Mostly because I'd have to either setup a new buf/iov/msg in the helper's stack or pass that all through from serve_socket ;). A few notes on the tests: I don't have a stand-alone 'wait' on my system (it's built into most shells [3]), but I've added an explicit check for it because POSIX doesn't require it to be built in. The waiting in the create/start tests is a bit awkward, but here's the reasoning for the various steps: * Put the socket in a 'sock' subdirectory so we don't have to mess with inotifywait's --exclude (when what we really want is an --include). In most cases, the socket would be the first thing created in the test directory, but the process.host test will create a pivot-root.* before creating the socket. * Clear a 'wait' file before launching the inotifywait/start subshell and append to it from that subshell so grep has something to look at (even if it racily starts looking before the subshell processes the inotifywait line). * Block on a busy-wait grep until inotifywait sets up its watcher. This ensures we don't call ccon and create the socket before the watcher is looking. The zero-second sleep while we wait for inotifywait to setup its watcher is busy, but POSIX doesn't require 'sleep' to support non-integer times [4]. All of these issues could be abstracted out into an 'event' command [5], but they're fine for a proof-of-concept create/start split. [1]: https://groups.google.com/a/opencontainers.org/d/msg/dev/qWHoKs8Fsrk/k55FQrBzBgAJ Subject: Re: Splitting Start into create() and run() Date: Thu, 24 Mar 2016 15:04:14 -0700 Message-ID: <[email protected]> [2]: opencontainers/runtime-spec#384 Subject: Split create and start [3]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tag_17_06 [4]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sleep.html [5]: opencontainers/runtime-spec#508 Subject: runtime: Add an 'event' operation for subscribing to pushes
The current state operation allows callers to poll for the state, but for some workflows polling is inefficient (how frequently do you poll to balance the cost of polling against the timeliness of the response?) and push notifications make more sense.
The runtime's create process is in a unique position to detect these status transitions.
As the actor carrying out container creation, it should have a clear idea of when that creation completes (for the
created
event).It may setup a communication channel with the container process to orchestrate creation, and that channel may be used to report the start event.
It knows (or is) the parent of the container process, and POSIX's wait(3), waitpid(3), and waitid(3) only work for child processes. From wait(3):
So the runtime can setup (or be) the parent waiting on the container process, and arrange for the 'stopped' event to be published on container exit.
I've tried to phrase the requirements conservatively to allow for runtimes that have to poll their kernel or some such to notice these changes. I see the following runtime-support cases:
The runtime can easily supply a push-based event operation. In this case, exposing that operation to callers faciliates push-based workflows without much cost.
The runtime cannot supply a push-based event operation, and has to emulate it by polling. In this case, the runtime can pick a polling strategy that makes sense to its maintainers, and callers who aren't satisfied with that strategy can roll their own state poller without a big efficiency hit (in a lesser of two evils way).
The requirement is currently worded so weakly that a runtime would be compliant with:
stopped
event with a 5pm (and some microseconds) timestamp.stopped
instatus
.which is a pretty low bar.
The runtime could supply a push-based event operation, but it would be a lot of work. These runtimes can use polling (2) as a quick-and-dirty solution until someone has time to implement a push-based solution (1). The improvement is transparent to users, who can use the same event operation throughout and passively reap the benefits of the implementation improvements.
Without an operation like this, higher levels that need to trigger on these transitions without polling need to exercise a lot of control over the system:
PR_SET_CHILD_SUBREAPER
ancestor of) the create process, and the create process needs to exit after creation completes, if they want to block on thecreated
event.started
ordeleted
events.PR_SET_CHILD_SUBREAPER
if they want to block on thestopped
event.By making
event
a runtime requirement, we allow for efficient cross-platform push-based workflows while avoiding the need for tight orchestrator gate-keeping. Runtimes that have implementation difficulties have an easy out that allows their callers to benefit from future implemenation improvements. And callers that are not satisfied can always fall back to polling state or the proxy/waitid
/PR_SET_CHILD_SUBREAPER
approaches.The buffering requirement is very generic, and I expect more details to land in the runtime API specification. Requirements around the buffer-request semantics (e.g. “buffer for 15 seconds” or “buffer the last 5 events” or “buffer the
created
event”) seemed out of place at the level of detail in this specification.The goal is to allow for:
To support blocking on the event without racing on “maybe the
created
event happened before the event operation attached”.Given the small number of required events, this buffering should not be a large resource concern, and
--event-buffer created
would only ever require a single event to be buffered per container.There is still a race on “maybe the container has already been destroyed and a second container has been created with the same ID before the event operation attached”, but that seems much less likely (especially since the caller is free to pick UUIDs).
This builds on #507, so that should be reviewed first.
UPDATE: note that this pub/sub model does not require an additional long-running process. For an example implementation based on shared access to a directory (like runC uses for state) and inotify, see wking/inotify-pubsub.