-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup bundle.md #210
Cleanup bundle.md #210
Conversation
On Fri, Oct 02, 2015 at 07:56:16AM -0700, Doug Davis wrote:
Agreed, although I'm fine if runtimes support extensions to select
I don't see the point of requiring this 3, but yeah, I think that's
I don't think we require that for rootfs, since the path from
+1. |
In bundle.md it say: A single rootfs directory MUST be in the same directory as the config.json I don’t personally have a preference yet - was just trying to keep the current state of things and get clarity. -Doug |
On Fri, Oct 02, 2015 at 10:50:32AM -0700, Doug Davis wrote:
Well that's pretty unambiguous ;), so I'd guess language like that is |
It does not specify how to transfer a container between computers, how to discover containers, or assign names or versions to them. | ||
Any distribution method capable of preserving the original layout of a container, as specified here, is considered compliant. | ||
The definition of a bundle is only concerned with how a container, and its configuration data, are stored on a local file system so that it can be consumed by a compliant runtime. | ||
Issues such as distribution, including how to transfer a container between runtimes, assigning names, versioning of bundle, or discovery of bundles are out of scope of this specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean out of scope for this section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Mon, Oct 05, 2015 at 09:45:31AM -0700, Mrunal Patel wrote:
+Issues such as distribution, including how to transfer a
container between runtimes, assigning names, versioning of bundle,
or discovery of bundles are out of scope of this specification.You mean out of scope for this section?
Maybe we should just drop this line to punt on this issue, because I
think it's out of scope for this specification. Lots of back and
forth on this point in 1 ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, taking the line out for now may not be a bad idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following. These topics are out of scope of the "specification" right now, no? I know the charter discussions are happening to decide for sure, but as of now, they're out of scope. So, I don't see the issue with saying what's out of scope. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Mon, Oct 05, 2015 at 10:27:00AM -0700, Doug Davis wrote:
+Issues such as distribution, including how to transfer a
container between runtimes, assigning names, versioning of bundle,
or discovery of bundles are out of scope of this specification.I'm not following. These topics are out of scope of the
"specification" right now, no? I know the charter discussions are
happening to decide for sure, but as of now, they're out of
scope. So, I don't see the issue with saying what's out of scope.
What am I missing?
To me “out of scope” is explicitly “we will never cover this”, and I'd
use “unspecified” for “we will cover this at some point, but haven't
got around to it yet”. With that phrasing, I think we all agree that
these distribution issues are currently unspecified, but we disagree
about whether they're out of scope for this spec.
But I don't have sources to cite for that distinction, so maybe it's
just me ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is the distinction. Discoverability is not explicitly out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what's there is more appropriate since all we can say is a reflection of the charter - and as of now its "out of scope" - we will never cover this w/o a charter change. Using a word like "unspecified" could be interpreted as "we're choosing not to define it at this time but we could later", but that's not true w/o a charter change.
@vbatts : what's in or out of scope is still TBD. As much as I think the current scope of what's in this doc & PR says is too limited, I'm trying to go with what's agreeable for "in scope" for now and then will let the charter discussions change this doc later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not. the charter currently says
d. Decentralized. Discovery of container images should be simple and facilitate a federated namespace and distributed retrieval.
So there should not be wording that something like this is "out of scope"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I have no idea what a section titled "OCI Values" even means and I've been pushing to fix that because some people interpret it as just fluff filler text that's non-normative, while others (as you've hinted) view it as part of our "in scope list of work items". Because if its our list of work items then when 'c' includes "image auditing" and "cryptographic primitives", where is that in our work?
But I was actually hoping to avoid charter discussions as part of this PR. I'm ok with removing this sentence for now but immediately after the charter is finalized I think we need to add something like it back in so people have the right level of expectations w.r.t. what our goals are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, i would remove for now until the charter is finished, otherwise this pr LGTM
questionable text removed - all comments addressed - I think |
LGTM |
- One or more content directories | ||
- A configuration file | ||
1. `config.json` : immutable, host independent configuration. | ||
This file, which MUST be named `config.json`, contains settings that are host independent and application specific such as security permissions, environment variables and arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“security permissions” seems vague. Do you mean linux.capabilities
? I'm not sure how that separates out from cgroups which are also sometimes about permissions (e.g. the device cgroup is just about permissions), but I'd probably drop that entry from your list of examples. However, after dropping it, the only remaining entries are about process
, so I think we should maybe just stop after “application specific” ;).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure who wrote that phrase originally. @vbatts @crosbymichael any thoughts? Remove examples or is there another example we should use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just an example, being vague is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Mon, Oct 05, 2015 at 03:40:11PM -0700, Michael Crosby wrote:
+This file, which MUST be named
config.json
, contains settings
that are host independent and application specific such as
security permissions, environment variables and arguments.it's just an example, being vague is ok.
I think it's misleading, because almost all of the security stuff is
going to be handled in runtime.json. If we mean Linux capabilities,
it seems easy enough to just say that.
But whatever, there's a link to the config specs right after each
bullet point, so folks can see what's actually in the files ;).
@duglin, ‘git log -p -G 'security permissions'’ points to 7232e4b
(specs: introduce the concept of a runtime.json, 2015-07-30, #88).
The only inline comment on anything in its general viscinity was the
conversation I had with @philips 1 that lead to me filing #107 with
a definition for “application” (since removed from that PR 2).
The goal is that the bundle can be moved as a unit to another machine and run the same application if `runtime.json` is removed or reconfigured. | ||
3. A directory representing the root filesystem of the container. | ||
While the name of this directory may be arbitrary, users should consider using a conventional name, such as `rootfs`. | ||
This directory will be referenced from within the `config.json` file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/will/MUST/ ?
Mainly just moved stuff around, but also tried to add some clarity around what is required w.r.t. naming and location of files/dirs. Signed-off-by: Doug Davis <[email protected]>
LGTM |
Some history behind bundle requirements: * 77d44b1 (Update runtime.md, 2015-06-16) lands the initial reference to a root filesystem, requiring a relative path. It also lands the "bundle" construct, which at this point includes content directories, signatures, and the configuration file. The content directories "at least" include the root filesystem. * 5d2eb18 (*: re-org the spec, 2015-06-24) shifts the bundle docs to bundle.md and demotes signatures to "other related content". * 91f5ad7 (bundle.md: various updates to latest spec, 2015-07-02, opencontainers#55) finishes the signature demotion and strengthens the root-inclusion requirement with another "must include". * 7232e4b (specs: introduce the concept of a runtime.json, 2015-07-30, opencontainers#88) split out runtime.json, required the root directory to exist at `rootfs`, and dropped most references to "content directories". * 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) kept the requirement for a rootfs directory in the bundle root, but relaxed the name requirement to allow other single-component names (e.g. `my-rootfs`). Dropped the last reference to "content directories". * cb2da54 (config: Single, unified config file, 2015-12-28, opencontainers#284) rolled runtime.json back into config.json. * b2e9154 (Remove requirement for rootfs path to be relative, 2016-04-22, opencontainers#394) allowed absolute paths for root.path and removed some "same directory" language while leaving other "same directory" language. I think the root filesystem should be optional [1], but even folks who disagree on that point have come to the conclusion that it doesn't need to be in the bundle [2]. opencontainers#394 seems partially unfinished, but I think the intention was clear. Once you relax the "bundle must contain the root filesystem" requirement, the only thing that the bundle must contain is config.json. It doesn't seem to be worth the trouble to name a "bundle" construct if its only meaning is "the directory that holds config.json", so this commit removes all remaining references to the term "bundle". [1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/6ZKMNWujDhU Subject: Dropping the rootfs requirement and restoring arbitrary bundle content Date: Wed, 26 Aug 2015 12:54:47 -0700 Message-ID: <[email protected]> [2]: opencontainers#389 (comment) Signed-off-by: W. Trevor King <[email protected]>
Some history behind bundle requirements: * 77d44b1 (Update runtime.md, 2015-06-16) lands the initial reference to a root filesystem, requiring a relative path. It also lands the "bundle" construct, which at this point includes content directories, signatures, and the configuration file. The content directories "at least" include the root filesystem. * 5d2eb18 (*: re-org the spec, 2015-06-24) shifts the bundle docs to bundle.md and demotes signatures to "other related content". * 91f5ad7 (bundle.md: various updates to latest spec, 2015-07-02, opencontainers#55) finishes the signature demotion and strengthens the root-inclusion requirement with another "must include". * 7232e4b (specs: introduce the concept of a runtime.json, 2015-07-30, opencontainers#88) split out runtime.json, required the root directory to exist at `rootfs`, and dropped most references to "content directories". * 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) kept the requirement for a rootfs directory in the bundle root, but relaxed the name requirement to allow other single-component names (e.g. `my-rootfs`). Dropped the last reference to "content directories". * cb2da54 (config: Single, unified config file, 2015-12-28, opencontainers#284) rolled runtime.json back into config.json. * b2e9154 (Remove requirement for rootfs path to be relative, 2016-04-22, opencontainers#394) allowed absolute paths for root.path and removed some "same directory" language while leaving other "same directory" language. I think the root filesystem should be optional [1], but even folks who disagree on that point have come to the conclusion that it doesn't need to be in the bundle [2]. opencontainers#394 seems partially unfinished, but I think the intention was clear. Once you relax the "bundle must contain the root filesystem" requirement, the only thing that the bundle must contain is config.json. It doesn't seem to be worth the trouble to name a "bundle" construct if its only meaning is "the directory that holds config.json", so this commit removes all remaining references to the term "bundle". [1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/6ZKMNWujDhU Subject: Dropping the rootfs requirement and restoring arbitrary bundle content Date: Wed, 26 Aug 2015 12:54:47 -0700 Message-ID: <[email protected]> [2]: opencontainers#389 (comment) Signed-off-by: W. Trevor King <[email protected]>
Some history behind bundle requirements: * 77d44b1 (Update runtime.md, 2015-06-16) lands the initial reference to a root filesystem, requiring a relative path. It also lands the "bundle" construct, which at this point includes content directories, signatures, and the configuration file. The content directories "at least" include the root filesystem. * 5d2eb18 (*: re-org the spec, 2015-06-24) shifts the bundle docs to bundle.md and demotes signatures to "other related content". * 91f5ad7 (bundle.md: various updates to latest spec, 2015-07-02, opencontainers#55) finishes the signature demotion and strengthens the root-inclusion requirement with another "must include". * 7232e4b (specs: introduce the concept of a runtime.json, 2015-07-30, opencontainers#88) split out runtime.json, required the root directory to exist at `rootfs`, and dropped most references to "content directories". * 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) kept the requirement for a rootfs directory in the bundle root, but relaxed the name requirement to allow other single-component names (e.g. `my-rootfs`). Dropped the last reference to "content directories". * cb2da54 (config: Single, unified config file, 2015-12-28, opencontainers#284) rolled runtime.json back into config.json. * b2e9154 (Remove requirement for rootfs path to be relative, 2016-04-22, opencontainers#394) allowed absolute paths for root.path and removed some "same directory" language while leaving other "same directory" language. I think the root filesystem should be optional [1], but even folks who disagree on that point have come to the conclusion that it doesn't need to be in the bundle [2]. opencontainers#394 seems partially unfinished, but I think the intention was clear. Once you relax the "bundle must contain the root filesystem" requirement, the only thing that the bundle must contain is config.json. It doesn't seem to be worth the trouble to name a "bundle" construct if its only meaning is "the directory that holds config.json", so this commit removes all remaining references to the term "bundle". [1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/6ZKMNWujDhU Subject: Dropping the rootfs requirement and restoring arbitrary bundle content Date: Wed, 26 Aug 2015 12:54:47 -0700 Message-ID: <[email protected]> [2]: opencontainers#389 (comment) Signed-off-by: W. Trevor King <[email protected]>
Some history behind bundle requirements: * 77d44b1 (Update runtime.md, 2015-06-16) lands the initial reference to a root filesystem, requiring a relative path. It also lands the "bundle" construct, which at this point includes content directories, signatures, and the configuration file. The content directories "at least" include the root filesystem. * 5d2eb18 (*: re-org the spec, 2015-06-24) shifts the bundle docs to bundle.md and demotes signatures to "other related content". * 91f5ad7 (bundle.md: various updates to latest spec, 2015-07-02, opencontainers#55) finishes the signature demotion and strengthens the root-inclusion requirement with another "must include". * 7232e4b (specs: introduce the concept of a runtime.json, 2015-07-30, opencontainers#88) split out runtime.json, required the root directory to exist at `rootfs`, and dropped most references to "content directories". * 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) kept the requirement for a rootfs directory in the bundle root, but relaxed the name requirement to allow other single-component names (e.g. `my-rootfs`). Dropped the last reference to "content directories". * cb2da54 (config: Single, unified config file, 2015-12-28, opencontainers#284) rolled runtime.json back into config.json. * b2e9154 (Remove requirement for rootfs path to be relative, 2016-04-22, opencontainers#394) allowed absolute paths for root.path and removed some "same directory" language while leaving other "same directory" language. I think the root filesystem should be optional [1], but even folks who disagree on that point have come to the conclusion that it doesn't need to be in the bundle [2]. opencontainers#394 seems partially unfinished, but I think the intention was clear. Once you relax the "bundle must contain the root filesystem" requirement, the only thing that the bundle must contain is config.json. It doesn't seem to be worth the trouble to name a "bundle" construct if its only meaning is "the directory that holds config.json", so this commit removes all remaining references to the term "bundle". [1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/6ZKMNWujDhU Subject: Dropping the rootfs requirement and restoring arbitrary bundle content Date: Wed, 26 Aug 2015 12:54:47 -0700 Message-ID: <[email protected]> [2]: opencontainers#389 (comment) Signed-off-by: W. Trevor King <[email protected]>
Some history behind bundle requirements: * 77d44b1 (Update runtime.md, 2015-06-16) lands the initial reference to a root filesystem, requiring a relative path. It also lands the "bundle" construct, which at this point includes content directories, signatures, and the configuration file. The content directories "at least" include the root filesystem. * 5d2eb18 (*: re-org the spec, 2015-06-24) shifts the bundle docs to bundle.md and demotes signatures to "other related content". * 91f5ad7 (bundle.md: various updates to latest spec, 2015-07-02, opencontainers#55) finishes the signature demotion and strengthens the root-inclusion requirement with another "must include". * 7232e4b (specs: introduce the concept of a runtime.json, 2015-07-30, opencontainers#88) split out runtime.json, required the root directory to exist at `rootfs`, and dropped most references to "content directories". * 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) kept the requirement for a rootfs directory in the bundle root, but relaxed the name requirement to allow other single-component names (e.g. `my-rootfs`). Dropped the last reference to "content directories". * cb2da54 (config: Single, unified config file, 2015-12-28, opencontainers#284) rolled runtime.json back into config.json. * b2e9154 (Remove requirement for rootfs path to be relative, 2016-04-22, opencontainers#394) allowed absolute paths for root.path and removed some "same directory" language while leaving other "same directory" language. I think the root filesystem should be optional [1], but even folks who disagree on that point have come to the conclusion that it doesn't need to be in the bundle [2]. opencontainers#394 seems partially unfinished, but I think the intention was clear. Once you relax the "bundle must contain the root filesystem" requirement, the only thing that the bundle must contain is config.json. It doesn't seem to be worth the trouble to name a "bundle" construct if its only meaning is "the directory that holds config.json", so this commit removes all remaining references to the term "bundle". [1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/6ZKMNWujDhU Subject: Dropping the rootfs requirement and restoring arbitrary bundle content Date: Wed, 26 Aug 2015 12:54:47 -0700 Message-ID: <[email protected]> [2]: opencontainers#389 (comment) Signed-off-by: W. Trevor King <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Hopefully-Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Partially-Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was recieved and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also drops the file-descriptor docs from runtime-linux. It's unclear how these apply to runtimes APIs that are not based on the command line / execve, and the functionality is covered by the more tightly scoped LISTEN_FDS wording in the command-line docs. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primatives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Versioning The command-line interface is largely orthogonal to the config format, and config authors and runtime callers may be entirely different sets of people. Zhang Wei called for more explicit versioning for the CLI [interface-versioning], and the approach taken here follows the approach taken by Python's email package [python-email-version]. Wedging multiple, independently versioned entities into a single repository can be awkward, but earlier proposals to put the CLI in its own repository [separate-repository-proposed] were unsuccessful because compliance testing requires both a CLI and a config specification [separate-repository-refused]. Trevor doesn't think that's a solid reason [separate-repository-refusal-rebutted], but discussion along that line stalled out, so the approach taken here is to keep both independently versioned entities in the same repository. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [interface-versioning]: opencontainers#513 (comment) [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [python-email-version]: https://docs.python.org/3/library/email.html#package-history [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [separate-repository-proposed]: opencontainers#513 (comment) [separate-repository-refused]: opencontainers#513 (comment) [separate-repository-refusal-rebutted]: opencontainers#513 (comment) [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Versioning The command-line interface is largely orthogonal to the config format, and config authors and runtime callers may be entirely different sets of people. Zhang Wei called for more explicit versioning for the CLI [interface-versioning], and the approach taken here follows the approach taken by Python's email package [python-email-version]. Wedging multiple, independently versioned entities into a single repository can be awkward, but earlier proposals to put the CLI in its own repository [separate-repository-proposed] were unsuccessful because compliance testing requires both a CLI and a config specification [separate-repository-refused]. Trevor doesn't think that's a solid reason [separate-repository-refusal-rebutted], but discussion along that line stalled out, so the approach taken here is to keep both independently versioned entities in the same repository. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [interface-versioning]: opencontainers#513 (comment) [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [python-email-version]: https://docs.python.org/3/library/email.html#package-history [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [separate-repository-proposed]: opencontainers#513 (comment) [separate-repository-refused]: opencontainers#513 (comment) [separate-repository-refusal-rebutted]: opencontainers#513 (comment) [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Versioning The command-line interface is largely orthogonal to the config format, and config authors and runtime callers may be entirely different sets of people. Zhang Wei called for more explicit versioning for the CLI [interface-versioning], and the approach taken here follows the approach taken by Python's email package [python-email-version]. Wedging multiple, independently versioned entities into a single repository can be awkward, but earlier proposals to put the CLI in its own repository [separate-repository-proposed] were unsuccessful because compliance testing requires both a CLI and a config specification [separate-repository-refused]. Trevor doesn't think that's a solid reason [separate-repository-refusal-rebutted], but discussion along that line stalled out, so the approach taken here is to keep both independently versioned entities in the same repository. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [interface-versioning]: opencontainers#513 (comment) [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [python-email-version]: https://docs.python.org/3/library/email.html#package-history [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [separate-repository-proposed]: opencontainers#513 (comment) [separate-repository-refused]: opencontainers#513 (comment) [separate-repository-refusal-rebutted]: opencontainers#513 (comment) [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Versioning The command-line interface is largely orthogonal to the config format, and config authors and runtime callers may be entirely different sets of people. Zhang Wei called for more explicit versioning for the CLI [interface-versioning], and the approach taken here follows the approach taken by Python's email package [python-email-version]. Wedging multiple, independently versioned entities into a single repository can be awkward, but earlier proposals to put the CLI in its own repository [separate-repository-proposed] were unsuccessful because compliance testing requires both a CLI and a config specification [separate-repository-refused]. Trevor doesn't think that's a solid reason [separate-repository-refusal-rebutted], but discussion along that line stalled out, so the approach taken here is to keep both independently versioned entities in the same repository. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [interface-versioning]: opencontainers#513 (comment) [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [python-email-version]: https://docs.python.org/3/library/email.html#package-history [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [separate-repository-proposed]: opencontainers#513 (comment) [separate-repository-refused]: opencontainers#513 (comment) [separate-repository-refusal-rebutted]: opencontainers#513 (comment) [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
# Commands ## create The --bundle [start-pr-bundle] and --pid-file options and ID argument [runc-start-id] match runC's interface. One benefit of the early-exit 'create' is that the exit code does not conflate container process exits with "failed to setup the sandbox" exits. We can take advantage of that and use non-zero 'create' exits to allow stderr writing (so the runtime can log errors while dying without having to successfully connect to syslog or some such). Trevor still likes the long-running 'create' API because it makes collecting the exit code easier, see the entry under rejected-for-now avenues at the end of this commit message. ### --pid-file You can get the PID by calling 'state' [container-pid-from-state], and container PIDs may not be portable [container-pid-not-portable]. But it's a common way for interfacing with init systems like systemd [systemd-pid], and for this first pass at the command line API folks are ok with some Linux-centrism [linux-centric]. ### Document LISTEN_FDS for passing open file descriptors This landed in runC with [runc-listen-fds], but the bundle-author <-> runtime specs explicitly avoided talking about how this is set (since the bundle-author didn't care about the runtime-caller <-> runtime interface) [runtime-spec-caller-api-agnostic]. This commit steps away from that agnosticism. Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out, since he doesn't see how the runtime-caller would choose anything other than 1 for its value. It seems like something that a process would have to set for itself (because guessing the PID of a child before spawning it seems racy ;). In any event, the runC implementation seems to set this to 1 regardless of what systemd passes to it [listen-fds-description]. We've borrowed Shishir's wording for the example [listen-fds-description]. ## state [state-pr] Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand on the definition of our ops, 2015-10-13, opencontainers#225, v0.4.0). The state example is adapted from runtime.md, but we defer the actual specification of the JSON to that file. The encoding for the output JSON (and all standard-stream activity) is covered by the "Character encodings" section. In cases where the runtime ignores the SHOULD (still technically compliant), RFC 7159 makes encoding detection reasonably straightforward [rfc7159-s8.1]. The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although these were dropped in RFC 7518 [rfc7518-aA], probably as a result of removing the constraint that "JSON text" be an object or array [rfc7518-aA]). The hints should still apply to the state output, because we know it will be an object. If that ends up being too dicey and we want to certify runtimes that do not respect their operating-system conventions, we can add an --encoding option later. ## kill Partially catch up with opencontainers/runtime-spec@be594153 (Split create and start, 2016-04-01, opencontainers#384). The interface is based on POSIX [posix-kill], util-linux [util-linux-kill], and GNU coreutils [coreutils-kill]. The TERM/KILL requirement is a minimum portability requirement for soft/hard stops. Windows lacks POSIX signals [windows-signals], and currently supports soft stops in Docker with whatever is behind hcsshim.ShutdownComputeSystem [docker-hcsshim]. The docs we're landing here explicitly allow that sort of substitution, because we need to have soft/hard stop on those platforms but *can't* use POSIX signals. They borrow wording from opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for platform.os and .arch, 2016-05-19, opencontainers#441) to recommend runtime authors document the alternative technology so bundle-authors can prepare (e.g. by installing the equivalent to a SIGTERM signal handler). # Command style Use imperative phrasing for command summaries, to follow the practice recommended by Python's PEP 257 [pep257-docstring]: The docstring is a phrase ending in a period. It prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ...". The commands have the following layout: ### {command name} {one-line description} * *Options:* ... ... * *Exit code:* ... {additional notes} #### Example {example} The four-space list indents follow opencontainers/runtime-spec@7795661 (runtime.md: Fix sub-bullet indentation, 2016-06-08, opencontainers#495). From [markdown-syntax]: List items may consist of multiple paragraphs. Each subsequent paragraph in a list item must be indented by either 4 spaces or one tab... Trevor expects that's intended to be read with "block element" instead of "paragraph", in which case it applies to nested lists too. And while GitHub supports two-space indents [github-lists]: You can create nested lists by indenting lines by two spaces. it seems that pandoc does not. # Versioning The command-line interface is largely orthogonal to the config format, and config authors and runtime callers may be entirely different sets of people. Zhang Wei called for more explicit versioning for the CLI [interface-versioning], and the approach taken here follows the approach taken by Python's email package [python-email-version]. Wedging multiple, independently versioned entities into a single repository can be awkward, but earlier proposals to put the CLI in its own repository [separate-repository-proposed] were unsuccessful because compliance testing requires both a CLI and a config specification [separate-repository-refused]. Trevor doesn't think that's a solid reason [separate-repository-refusal-rebutted], but discussion along that line stalled out, so the approach taken here is to keep both independently versioned entities in the same repository. # Global options This section is intended to allow runtimes to extend the command line API with additional options and commands as they see fit without interfering with the commands and options specified in this document. The last line in this section makes it explicit that any later specification (e.g. "MUST print the state JSON to its stdout") do not apply to cases where the caller has included an unspecified option or command (e.g. --format=protobuf). For extensive discussion on this point see [extensions-unspecified]. With regard to the statement "Command names MUST NOT start with hyphens", the rationale behind this decision is to distinguish unrecognized commands from unrecognized options [distinguish-unrecognized-commands] because we want to allow (but not require) runtimes to fail fast when faced with an unrecognized command [optional-fail-fast]. # Long options Use GNU-style long options to avoid ambiguous, one-character options in the spec, while still allowing the runtime to support one-character options with packing. We don't specify one-character options in this spec, because portable callers can use the long form, and not specifying short forms leaves runtimes free to assign those as they see fit. # Character encodings Punt to the operating system for character encodings. Without this, the character set for the state JSON or other command output seemed too ambiguous. Trevor wishes there were cleaner references for the {language}.{encoding} locales like en_US.UTF-8 and UTF-8. But [wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't find a more targetted UTF-8 link than just dropping folks into a Unicode chapter (which is what [wikipedia-utf-8] does): The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011) With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95. The TR35 link is for: In addition, POSIX locales may also specify the character encoding, which requires the data to be transformed into that target encoding. and the POSIX §6.2 link is for: In other locales, the presence, meaning, and representation of any additional characters are locale-specific. # Standard streams The "MUST NOT attempt to read from its stdin" means a generic caller can safely exec the command with a closed or null stdin, and not have to worry about the command blocking or crashing because of that. The stdout spec for start/delete is more lenient, because runtimes are unlikely to change their behavior because they are unable to write to stdout. If this assumption proves troublesome, we may have to tighten it up later. Aleksa Sarai also raised concerns over the safety of potentially giving the container process access to terminal ioctl escapes [stdio-ioctl] and feels like the stdio file-descriptor pass-through is surprising [stdio-surprise]. # Console socket protocol Based on in-flight work by Aleksa in opencontainers/runc#1018, this commit makes the following choices: * SOCK_SEQPACKET instead of SOCK_STREAM, because this is a message-based protocol, so it seems more natural to use a message-oriented socket type. * A string 'type' field for all messages, so we can add additional message types in the future without breaking backwards compatibility (new console-socket servers will still support old clients). Aleksa favored splitting this identifier into an integer 'type' and 'version' fields [runc-socket-type-version], but I don't see the point if they're both opaque integers without internal structure. And I expect this protocol to be stable enough that it's not worth involving SemVer and its structured versioning. * Response messages, so the client can tell whether the request was received and processed successfully or not. That gives the client a way to bail out if, for example, the server does not support the 'terminal' message type. * Add a sub-package specs-go/socket. Even though there aren't many new types, these are fairly different from the rest of specs-go and that namespace was getting crowded. # Event triggers The "Callers MAY block..." wording is going to be hard to enforce, but with the runC model, clients rely on the command exits to trigger post-create and post-start activity. The longer the runtime hangs around after completing its action, the laggier those triggers will be. For an alternative event trigger approach, see the discussion of an 'event' command in the rejected-for-now avenues at the end of this commit message. # Lifecycle notes These aren't documented in the current runtime-spec, and may no longer be true. But they were true at one point, and informed the development of this specification. ## Process cleanup On IRC on 2015-09-15 (with PDT timestamps): 10:56 < crosbymichael> if the main process dies in the container, all other process are killed ... 10:58 < julz> crosbymichael: I'm assuming what you mean is you kill everything in the cgroup -> everything in the container dies? 10:58 < crosbymichael> julz: yes, that is how its implemented ... 10:59 < crosbymichael> julz: we actually freeze first, send the KILL, then unfreeze so we don't have races ## Container IDs for namespace joiners You can create a config that adds no isolation vs. the runtime namespace or completely joins another set of existing namespaces. It seems odd to call that a new "container", but the ID is really more of a process ID, and less of a container ID. The "container" phrasing is just a useful hint that there might be some isolation going on. And we're always creating a new "container process" with 'create'. # Other changes This commit also moves the file-descriptor docs from runtime-linux.md into runtime.md and the command-line docs. Both affect runtime authors, but: * The runtime.md entry is more useful for bundle authors than the old wording, because it gives them confidence that the runtime caller will have the power to set these up as they see fit (within POSIX's limits). It is also API-agnostic, so bundle authors know they won't have to worry about which API will be used to launch the container before deciding whether it is safe to rely on runtime-caller file-descriptor control. * The command line entry is more useful for runtime-callers than the old wording, because it tells you how to setup the file descriptors instead of just telling you that they MAY be setup. I moved the bundle-author language from runtime-linux.md to runtime.md because it's relying on POSIX primitives that aren't Linux-specific. # Avenues pursued but rejected (for now) * Early versions of this specification had 'start' taking '--config' and '--runtime', but this commit uses '--bundle' [start-pr-bundle]. The single config file change [single-config-proposal] went through, but Trevor would also like to be able to pipe a config into the 'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path) [runc-config-via-stdin], and he has a working example that supports this without difficulty [ccon-config-via-stdin]. But since [runc-bundle-option] landed on 2015-11-16, runC has replaced their --config-file and --runtime-file flags with --bundle, and the current goal of this API is "keeping as much similarity with the existing runC command-line as possible", not "makes sense to Trevor" ;). It looks like runC was reacting [runc-required-config-file] to strict wording in the spec [runtime-spec-required-config-file], so we might be able to revisit this if/when we lift that restriction. * Having 'start' (now 'create') take a --state option to write state to a file [start-pr-state]. This is my preferred approach to sharing container state, since it punts a persistent state registry to higher-level tooling [punt-state-registry]. But runtime-spec currently requires the runtime to maintain such a registry [state-registry], and we don't need two ways to do that ;). On systems like Solaris, the kernel maintains a registry of container IDs directly, so they don't need an external registry [solaris-kernel-state]. * Having 'start' (now 'create') take an --id option instead of a required ID argument, and requiring the runtime to generate a unique ID if the option was not set. When there is a long-running host process waiting on the container process to perform cleanup, the runtime-caller may not need to know the container ID. However, runC has been requiring a user-specified ID since [runc-start-id], and this spec follows the early-exit 'create' from [runc-create-start], so we require one here. We can revisit this if we regain a long-running 'create' process. * Having 'create' take a '--console-socket PATH' option (required when process.terminal is true) with a path to a SOCK_SEQPACKET Unix socket for use with the console-socket protocol. The current 'LISTEN_FDS + 3' approach was proposed by Michael Crosby [console-socket-fd], but Trevor doesn't have a clear idea of the motivation for the change and would have preferred '--console-socket FD'. * Having a long-running 'create' process. Trevor is not a big fan of this early-exit 'create', which requires platform-specific magic to collect the container process's exit code. The ptrace idea in this commit is from Mrunal [mrunal-ptrace]. Trevor has a proposal for an 'event' operation [event] which would provide a convenient created trigger. With 'event' in place, we don't need the 'create' process exit to serve as that trigger, and could have a long-running 'create' that collects the container process exit code using the portable waitid() family. But the consensus after this week's meeting was to table that while we land docs for the runC API [mimic-runc]. * Having a 'version' command to make it easy for a caller to report which runtime they're using. But we don't have a use-case that makes it strictly necessary for interop, so we're leaving it out for now [no-version]. * Using 'sh' syntax highlighting [syntax-highlighting] for the fenced code blocks. The 'sh' keyword comes from [linguist-languages]. But the new fenced code blocks are shell sessions, not scripts, and we don't want shell-syntax highlighting in the command output. [ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration [console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30 [container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376 Subject: Add initial pass at a cmd line spec [container-pid-not-portable]: opencontainers#459 Subject: [ Runtime ] Allow for excluding pid from state [coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html [distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167 Subject: Clarity for commands vs global options [docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230 moby/moby@bc503ca8 (Windows: [TP4] docker kill handling, 2015-10-12, moby/moby#16997) [event]: opencontainers#508 Subject: runtime: Add an 'event' operation for subscribing to pushes [extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56 [github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists [interface-versioning]: opencontainers#513 (comment) [linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml [linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39 [listen-fds-description]: opencontainers/runc#231 (comment) Subject: Systemd integration with runc, for on-demand socket activation [markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list [mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15 [mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54 [no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75 [optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617 Subject: Use RFC 2119's keywords (MUST, MAY, ...) [pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings [posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html [punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79 [python-email-version]: https://docs.python.org/3/library/email.html#package-history [rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3 [rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A RFC 7518 is currently identical to 7519. [rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1 [runc-bundle-option]: opencontainers/runc#373 Subject: adding support for --bundle [runc-config-via-stdin]: opencontainers/runc#202 Subject: Can runc take its configuration on stdin? [runc-listen-fds]: opencontainers/runc#231 Subject: Systemd integration with runc, for on-demand socket activation [runc-required-config-file]: opencontainers/runc#310 (comment) Subject: specifying a spec file on cmd line? [runc-socket-type-version]: opencontainers/runc#1018 (comment) Subject: Consoles, consoles, consoles. [runc-start-id]: opencontainers/runc#541 opencontainers/runc@a7278cad (Require container id as arg1, 2016-02-08, opencontainers/runc#541) [runtime-spec-caller-api-agnostic]: opencontainers#113 (comment) Subject: Add fd section for linux container process [runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) [sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html [separate-repository-proposed]: opencontainers#513 (comment) [separate-repository-refused]: opencontainers#513 (comment) [separate-repository-refusal-rebutted]: opencontainers#513 (comment) [single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY Subject: Single, unified config file (i.e. rolling back specs#88) Date: Wed, 4 Nov 2015 09:53:20 -0800 Message-ID: <[email protected]> [solaris-kernel-state]: wking/oci-command-line-api#3 (comment) Subject: Drop exec, pause, resume, and signal [start-pr-bundle]: wking/oci-command-line-api#11 Subject: start: Change --config and --runtime to --bundle [start-pr-state]: wking/oci-command-line-api#14 Subject: start: Add a --state option [state-pr]: wking/oci-command-line-api#16 Subject: runtime: Add a 'state' command [state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61 7117ede (Expand on the definition of our ops, 2015-10-13, opencontainers#225) [stdio-ioctl]: opencontainers#513 (comment) [stdio-surprise]: opencontainers#513 (comment) [syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting [systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69 [util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html [wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8 [wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms [windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356) Date: Thu, 26 May 2016 11:03:29 -0700 Message-ID: <[email protected]> Signed-off-by: Julian Friedman <[email protected]> Hopefully-Signed-off-by: Mike Brown <[email protected]> Signed-off-by: W. Trevor King <[email protected]> Reviewed-by: Jesse Butler <[email protected]>
Instead of leading off with links to a bunch of other places, notes on the Go tags, etc., make things more inviting by leading off with a big-picture summary of what the configuration is about. Also drop the config.json existance MUST because: 1. This section defines the configuration format, and doesn't need to be tied to a particular filename. 2. The bundle spec (in bundle.md) already has: This REQUIRED file MUST reside in the root of the bundle directory and MUST be named `config.json`. The config.md line may have been useful when it was added (77d44b1, Update runtime.md, 2015-07-16). But since the bundle.md line landed in 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210) I think it's been redundant. Signed-off-by: W. Trevor King <[email protected]>
Instead of leading off with links to a bunch of other places, notes on the Go tags, etc., make things more inviting by leading off with a big-picture summary of what the configuration is about. Also drop the config.json existance MUST because: 1. This section defines the configuration format, and doesn't need to be tied to a particular filename. 2. The bundle spec (in bundle.md) already has: This REQUIRED file MUST reside in the root of the bundle directory and MUST be named `config.json`. The config.md line may have been useful when it was added (77d44b1, Update runtime.md, 2015-07-16). But since the bundle.md line landed in 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210), I think it's been redundant. Signed-off-by: W. Trevor King <[email protected]>
Instead of leading off with links to a bunch of other places, notes on the Go tags, etc., make things more inviting by leading off with a big-picture summary of what the configuration is about. Also drop the config.json existence MUST because: 1. This section defines the configuration format, and doesn't need to be tied to a particular filename. 2. The bundle spec (in bundle.md) already has: This REQUIRED file MUST reside in the root of the bundle directory and MUST be named `config.json`. The config.md line may have been useful when it was added (77d44b1, Update runtime.md, 2015-07-16). But since the bundle.md line landed in 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210), I think it's been redundant. Signed-off-by: W. Trevor King <[email protected]>
Because: 1. This section defines the configuration format, and doesn't need to be tied to a particular filename. 2. The bundle spec (in bundle.md) already has: This REQUIRED file MUST reside in the root of the bundle directory and MUST be named `config.json`. The config.md line may have been useful when it was added (77d44b1, Update runtime.md, 2015-07-16). But since the bundle.md line landed in 106ec2d (Cleanup bundle.md, 2015-10-02, opencontainers#210), I think it's been redundant. Signed-off-by: W. Trevor King <[email protected]>
Mainly just moved stuff around, but also tried to add some clarity around
what is required w.r.t. naming and location of files/dirs.
But to highlight some things that I want to make sure we ALL agree on:
rootfs
Signed-off-by: Doug Davis [email protected]