Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement create and start #827

Merged
merged 11 commits into from
Jun 3, 2016

Conversation

crosbymichael
Copy link
Member

@crosbymichael crosbymichael commented May 16, 2016

This implements create and start in runc without the need for a unix socket or other complexity. It implements it by blocking the init process waiting on a SIGCONT before the users process is started.

This does not remove hooks, that can be done separately. This also updates the libcontainer stats to correctly report if the container is created vs running(user code).

It does not bind mount namespaces, if you want namespaces to be bind mounted then you can write code to bind them after create returns and before calling start.

It retains the current functionality of start today by adding a runc run command that does the same workflow as today.

Closes #506

@julz
Copy link
Contributor

julz commented May 16, 2016

Nice.

@crosbymichael
Copy link
Member Author

@julz thanks, plz take the time to play with it, runc is fully operational. lets make sure that it will fit and work with all of our needs but I think this implementation is simple and clean and will work fine.


status, err := startContainer(context, spec)
if err != nil {
if err := container.Signal(syscall.SIGCONT); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs at least a container.Status() guard like you added to delete in 572055d, since you only want to send SIGCONT to runtime code (and not send it after user-specified code has been executed). That's not quite enough though, since “the pending signal set is preserved across an execve(2)”. So “check the state and send SIGCONT if it was ‘created’” is going to be racy, and that race may result in user code seeing the extra SIGCONTs. A more robust solution would lock a resource (setting a flag in the state registry? Hold a Unix socket open?) when triggering code execution to avoid racing between two ‘start’ calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also simply reset the signal handler before doing the execve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Mon, May 16, 2016 at 09:00:22PM -0700, Kenfe-Mickaël Laventure wrote:

We can also simply reset the signal handler before doing the execve.

That happens automatically, no? Also from signal(7):

“During an execve(2), the dispositions of handled signals are reset
to the default; the dispositions of ignored signals are left
unchanged.”

What needs to happen is that after a SIGCONT is received, you block
(somehow) ‘create’ from sending further SIGCONTs, then consume any
SIGCONTs from the pending queue, and then execve the user code.

@crosbymichael crosbymichael added this to the 0.2.0 milestone May 17, 2016
@hqhq
Copy link
Contributor

hqhq commented May 20, 2016

So if host rebooted, all created containers would be gone? Is that acceptable?

@wking
Copy link
Contributor

wking commented May 20, 2016

On Fri, May 20, 2016 at 01:54:26AM -0700, Qiang Huang wrote:

So if host rebooted, all created containers would be gone? Is that
acceptable?

That's fine for me (it's how all my other processes work ;). If you
want to restore a container, have an init system set it up on boot
(possibly using checkpoint/restore, depending on how much you want to
preserve). But that all seems like it's out of scope for the runtime
layer.

@mrunalp
Copy link
Contributor

mrunalp commented May 20, 2016

@hqhq I think that is okay as this isn't the same as docker create. As long as we are clear in the spec, it should be fine.

@vishh
Copy link
Contributor

vishh commented May 20, 2016

One of the use-cases for hooks was to customize mounts on demand. Would it be possible to not pivot root as part of create and switch root only on start?

@wking
Copy link
Contributor

wking commented May 20, 2016

On Fri, May 20, 2016 at 03:31:07PM -0700, Vish Kannan wrote:

Would it be possible to not pivot root as part of create and switch
root only on start?

I don't think that's a good idea, because you may be using the
container process to hold open a namespace. In that case, there's no
reason to call ‘start’, but you still want the pivoted root in the
mount namespace.

I'm still not clear on why the pre-pivot mounts need to be dynamic,
instead of setting them up in config.json's mounts during pre-create
processing.

@crosbymichael
Copy link
Member Author

FYI

This change makes us require go 1.6 because previous versions of go would not let you handle SIGCONT, it just ignores it and blocks forever.

Device: "mqueue",
Flags: defaultMountFlags,
},
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this temporarily commented out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from the issue of mqueue not working on debian kernels on the CI in userns

// ContainerDestroyed - Container no longer exists,
// ConfigInvalid - config is invalid,
// ContainerPaused - Container is paused,
// Systemerror - System error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SystemError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably comment for Start should be changed to describe why it's waiting on signal.

@LK4D4
Copy link
Contributor

LK4D4 commented May 25, 2016

I see that all test were changed so they use old version. Maybe couple of tests for create/start are needed.

@duglin
Copy link

duglin commented May 25, 2016

would it be bad if we had the console flag default to /dev/pts/ptmx for runc create? From a UX perspective it would be easier to do that than to force all users to remember to supply it. W/o some value for that flag I think runc fails anyway.

@crosbymichael
Copy link
Member Author

@duglin No, because not all containers use a TTY.

@duglin
Copy link

duglin commented May 25, 2016

@crosbymichael how do I do a runc create w/o a --console flag? I can't seem to get it to not error out.

@crosbymichael
Copy link
Member Author

@duglin does your spec have terminal true?

@duglin
Copy link

duglin commented May 25, 2016

ah that was it - thanks

@duglin
Copy link

duglin commented May 25, 2016

:-) we could still have it default to that value when terminal is true

@LK4D4
Copy link
Contributor

LK4D4 commented May 25, 2016

@crosbymichael I see timeout on CI and somehow it hangs now :/

@crosbymichael
Copy link
Member Author

@cyphar thanks!

@mrunalp
Copy link
Contributor

mrunalp commented Jun 3, 2016

Tested. LGTM.

@hqhq
Copy link
Contributor

hqhq commented Jun 3, 2016

LGTM

Approved with PullApprove

@cyphar
Copy link
Member

cyphar commented Jun 3, 2016

One of the biggest issues I can see right off the bat is that runc create doesn't work for user namespaced containers because of #814 (you can't set any console path). While I'm okay with that being the case (for the moment while I work on fixing #814 -- which should be considered as blocking 1.0), we should definitely make it an explicit error rather than "here's some cryptic error message" (we can remove the message once we fix the console bug).

As an aside, it's quite annoying that you can't even specify --console /dev/null ... We need to fix this.

/cc @crosbymichael

@crosbymichael
Copy link
Member Author

@cyphar that is unrelated to this PR.

I really don't think you all understand what --console does or is for if you are trying to pass it dev/null. I'll show you want to do in another PR.

@crosbymichael crosbymichael merged commit c5060ff into opencontainers:master Jun 3, 2016
@crosbymichael crosbymichael deleted the create-start branch June 3, 2016 17:38
@cyphar
Copy link
Member

cyphar commented Jun 6, 2016

@crosbymichael explained on IRC that --console doesn't take a master, it takes a slave path and does no setup (which makes sense if you read the code). So the complaints about --console are not accurate, and our integration tests usage of --console is incorrect.

wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Feb 3, 2017
This gives us a more portable way to discover the container exit code
(vs. requiring callers to use subreapers [1] or other
platform-specific approaches which require knowledge of the runtime
implementation).

[1]: opencontainers/runc#827 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/oci-command-line-api that referenced this pull request Feb 9, 2017
I added this as an option in 5033c59 (Add an --id option to 'start',
2015-09-15), because some callers might want to leave ID generation to
the runtime.  When there is a long-running host process waiting on the
container process to perform cleanup, the runtime-caller may not need
to know the container ID.  However, runC has been requiring a
user-specified ID since [1], and the coming create/start split will
follow the early-exit 'create' from [2], so require an ID here.  We
can revisit this if we regain a long-running 'create' process.

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'start' (which
will become 'create').

[1]: opencontainers/runc#541
     opencontainers/runc@a7278cad (Require container id as arg1,
     2016-02-08, opencontainers/runc#541)
[2]: opencontainers/runc#827
     Summary: Implement create and start

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/oci-command-line-api that referenced this pull request Feb 9, 2017
Catch up with opencontainers/runtime-spec@be594153 (Split create and
start, 2016-04-01, opencontainers/runtime-spec#384).

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).

I still likes the long-running 'create' API because it makes
collecting the exit code easier.  I've proposed an 'event' operation
[1] which would provide a convenient created trigger.  With 'event' in
place, we don't need the 'create' process exit to serve as that
trigger, and could have a long-running 'create' that collects the
container process exit code using the portable waitid() family.  But
the consensus after the 2016-07-13 meeting was to table that while we
land docs for the runC API [2], and runC has an early-exit create [3].

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

The ptrace idea in this commit is from Mrunal [4].

[1]: opencontainers/runtime-spec#508
     Subject: runtime: Add an 'event' operation for subscribing to pushes
[2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[3]: opencontainers/runc#827
     Summary: Implement create and start
[4]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/oci-command-line-api that referenced this pull request Feb 9, 2017
I added this as an option in 5033c59 (Add an --id option to 'start',
2015-09-15), because some callers might want to leave ID generation to
the runtime.  When there is a long-running host process waiting on the
container process to perform cleanup, the runtime-caller may not need
to know the container ID.  However, runC has been requiring a
user-specified ID since [1], and the coming create/start split will
follow the early-exit 'create' from [2], so require an ID here.  We
can revisit this if we regain a long-running 'create' process.

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'start' (which
will become 'create').

[1]: opencontainers/runc#541
     opencontainers/runc@a7278cad (Require container id as arg1,
     2016-02-08, opencontainers/runc#541)
[2]: opencontainers/runc#827
     Summary: Implement create and start

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/oci-command-line-api that referenced this pull request Feb 9, 2017
Catch up with opencontainers/runtime-spec@be594153 (Split create and
start, 2016-04-01, opencontainers/runtime-spec#384).

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).

I still likes the long-running 'create' API because it makes
collecting the exit code easier.  I've proposed an 'event' operation
[1] which would provide a convenient created trigger.  With 'event' in
place, we don't need the 'create' process exit to serve as that
trigger, and could have a long-running 'create' that collects the
container process exit code using the portable waitid() family.  But
the consensus after the 2016-07-13 meeting was to table that while we
land docs for the runC API [2], and runC has an early-exit create [3].

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

The ptrace idea in this commit is from Mrunal [4].

[1]: opencontainers/runtime-spec#508
     Subject: runtime: Add an 'event' operation for subscribing to pushes
[2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[3]: opencontainers/runc#827
     Summary: Implement create and start
[4]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54

Signed-off-by: W. Trevor King <[email protected]>
stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017
…hortcut

config-linux: Use the implicit link name shortcut
wking added a commit to wking/ocitools-v2 that referenced this pull request Jan 11, 2018
Address a previous TODO.  And now that we are using --bundle, we no
longer need to set cmd.Dir.  The TODO mentions a lack of runc support,
but runc supports --bundle since opencontainers/runc@3fe7d7f3 (Add
create and start command for container lifecycle, 2016-05-13,
opencontainers/runc#827).
wking added a commit to wking/ocitools-v2 that referenced this pull request Jan 11, 2018
Address a previous TODO.  And now that we are using --bundle, we no
longer need to set cmd.Dir.  The TODO mentions a lack of runc support,
but runc supports --bundle since opencontainers/runc@3fe7d7f3 (Add
create and start command for container lifecycle, 2016-05-13,
opencontainers/runc#827).

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/ocitools-v2 that referenced this pull request Jan 18, 2018
Address a previous TODO.  And now that we are using --bundle, we no
longer need to set cmd.Dir.  The TODO mentions a lack of runc support,
but runc supports --bundle since opencontainers/runc@3fe7d7f3 (Add
create and start command for container lifecycle, 2016-05-13,
opencontainers/runc#827).

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/runc that referenced this pull request Feb 20, 2018
This avoids a panic for containers that do not set Process.  And even
if Process was set, there is no reason to require the executable to be
available *at create time* [1].  Subsequent activity could be
scheduled to get a binary in place at the configured location before
'start' is called.

[1]: opencontainers#827 (comment)

Signed-off-by: W. Trevor King <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.