Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run post-stop hooks before the container sandbox is deleted. #395

Closed
wants to merge 1 commit into from

Conversation

vishh
Copy link
Contributor

@vishh vishh commented Apr 22, 2016

One of the main use cases of post-stop is to scrape filesystem contents with a container's mount namespace, before deleting the container.
Running post-stop after the container is deleted is equivalent to deleting the container and running the post-stop hook outside of the spec context.

cc @mrunalp @crosbymichael @philips

@vishh vishh mentioned this pull request Apr 22, 2016
@wking
Copy link
Contributor

wking commented Apr 22, 2016

On Fri, Apr 22, 2016 at 01:43:57PM -0700, Vish Kannan wrote:

One of the main use cases of post-stop is to scrape filesystem
contents with a container's mount namespace, before deleting the
container. Running post-stop after the container is deleted is
equivalent to deleting the container and running the post-stop hook
outside of the spec context.

The post-stop hooks landed in #34, and a comment there explicitly
suggests bind-mounting in a pre-start hook if you want to preserve a
particular namespace 1. Requring the runtime to do something like
that to pin all namespaces for all containers seems like a lot of work
to save a bit of typing for folks who need the pinned workflow.

I agree that this makes the current post-stop hooks pretty useless,
but my preferred solution to that is dropping them from the spec ;).
If sub-container workflows like 2 and #391 are too awkward, I'd
rather make them easier (for folks who want them) than saddle all
users with pinned namespaces.

@vishh
Copy link
Contributor Author

vishh commented Apr 22, 2016

The Spec should enable simple programming patterns. If users/clients are
expected to understand the internals of each runtime, what is the use of
the Spec then?

On Fri, Apr 22, 2016 at 1:54 PM, W. Trevor King [email protected]
wrote:

On Fri, Apr 22, 2016 at 01:43:57PM -0700, Vish Kannan wrote:

One of the main use cases of post-stop is to scrape filesystem
contents with a container's mount namespace, before deleting the
container. Running post-stop after the container is deleted is
equivalent to deleting the container and running the post-stop hook
outside of the spec context.

The post-stop hooks landed in #34, and a comment there explicitly
suggests bind-mounting in a pre-start hook if you want to preserve a
particular namespace 1. Requring the runtime to do something like
that to pin all namespaces for all containers seems like a lot of work
to save a bit of typing for folks who need the pinned workflow.

I agree that this makes the current post-stop hooks pretty useless,
but my preferred solution to that is dropping them from the spec ;).
If sub-container workflows like 2 and #391 are too awkward, I'd
rather make them easier (for folks who want them) than saddle all
users with pinned namespaces.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#395 (comment)

@wking
Copy link
Contributor

wking commented Apr 22, 2016

On Fri, Apr 22, 2016 at 02:47:32PM -0700, Vish Kannan wrote:
“The Spec should enable simple programming patterns. If users/clients are
expected to understand the internals of each runtime, what is the use of
the Spec then?”

The sub-container suggestion is:

If you want to pin part of your sandbox, set it up with a parent
container. Then launch a subcontainer that joins the parent,
optionally adds additional sandboxing, and does the real work.
After the subcontainer dies, take what you want from the parent
container and kill the parent.

That requires zero knowledge of runtime internals, but does call for a
convenient way to launch sub-containers. #391 is my best guess at
joining an existing container, and it's already got some rough
edges. I think we have some work to do before join-and-extend
(e.g. “join container 123 and create a child PID namespace”) is
convenient. But both #391 and join-and-extend may be workflows that
are made easy via higher-level tooling based on a more basic runtime
spec. We dropped ‘exec’ in #388 because it was too hard to agree on
what it meant. I'd like to see #391 or similar get polished to the
point that implementing ‘exec’ via the more generic runtime-spec
tooling is convenient. I expect join-and-extend falls into the same
category, and we want to leave the spec alone but publish some
implementation suggestions that don't look too hideous.

@julz
Copy link
Contributor

julz commented Apr 25, 2016

I think the thing to notice is with sub containers you can get either the behaviour where pid 1 exiting automatically cleans things up or the behaviour where namespaces exist longer than a container - it can express both. Unless I'm missing something if you require a post stop hook run before the namespaces are cleaned up you can't support the pid1-exit-cleans-everything flow (because you'll need to bind mount them). It seems like a shame to give up this flexibility (not to mention the currently most common and understood semantics of what pid 1 exit means) when the alternative approach supports both scenarios..

@vishh
Copy link
Contributor Author

vishh commented Apr 25, 2016

I don't see why invoking a delete after the init process exits is an
issue in reality. Why do we have to avoid that step, other than the fact
that it is technically possible?

I don't like the fact that clients have to deal with bind mounts of
namespaces. To me, the Spec has failed to achieve its goals if it requires
clients to reverse engineer underlying runtime implementations.

An explicit delete step when supported across OSes will let users run
post-stop hooks without OS (or runtime) specific logic.

On Mon, Apr 25, 2016 at 11:51 AM, Julian Friedman [email protected]
wrote:

I think the thing to notice is with sub containers you can get either
the behaviour where pid 1 exiting automatically cleans things up or the
behaviour where namespaces exist longer than a container - it can express
both. Unless I'm missing something if you require a post stop hook run
before the namespaces are cleaned up you can't support the
pid1-exit-cleans-everything flow (because you'll need to bind mount them).
It seems like a shame to give up this flexibility (not to mention the
currently most common and understood semantics of what pid 1 exit means)
when the alternative approach supports both scenarios..


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#395 (comment)

@crosbymichael
Copy link
Member

The change as is LGTM

@julz
Copy link
Contributor

julz commented Apr 25, 2016

Doesn't a user already have to understand pretty much all of this to do exec (now we've dropped that from the spec in favour of explicitly entering namespaces via paths)? If we're OK with exec not being in the spec it seems like we're already ok with users needing to know about namespace paths..

@vishh
Copy link
Contributor Author

vishh commented Apr 25, 2016

An exec re-uses fields in the Spec that the users have to understand
anyways.
Whereas, bind mounting namespaces requires understanding of the linux
kernel which IMHO the Spec should try to avoid.

On Mon, Apr 25, 2016 at 12:08 PM, Julian Friedman [email protected]
wrote:

Doesn't a user already have to understand pretty much all of this to do
exec (now we've dropped that from the spec in favour of explicitly entering
namespaces via paths)? If we're OK with exec not being in the spec it seems
like we're already ok with users needing to know about namespace paths..


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#395 (comment)

@wking
Copy link
Contributor

wking commented Apr 25, 2016

On Mon, Apr 25, 2016 at 01:08:40PM -0700, Vish Kannan wrote:
“An exec re-uses fields in the Spec that the users have to
understand anyways. Whereas, bind mounting namespaces requires
understanding of the linux kernel which IMHO the Spec should try to
avoid.”

Sub-containers does not require out-of-spec knowledge 1.

@julz
Copy link
Contributor

julz commented Apr 25, 2016

Like @wking says, if the test is whether we only use fields that're already in the spec sub-containers work for all the above use cases (not just some) without needing to understand anything other than namespace and cgroup paths, which we're already ok with users needing to understand for exec. (Indeed really exec is just a variant of this flow that enters all namespaces..)

@vishh
Copy link
Contributor Author

vishh commented Apr 25, 2016

May be it will help if someone can clarify how one can run post-stop hooks
using just the Spec today, without modifying the root bundle, in a
reasonably portable manner? Some pseudo code instead of just words will
help facilitate this conversation.

On Mon, Apr 25, 2016 at 1:50 PM, Julian Friedman [email protected]
wrote:

Like @wking https://github.com/wking says, if the test is whether we
only use fields that're already in the spec sub-containers work for all the
above use cases (not just some) without needing to understand anything
other than namespace and cgroup paths, which we're already ok with users
needing to understand for exec. (Indeed really exec is just a variant of
this flow that enters all namespaces..)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#395 (comment)

@wking
Copy link
Contributor

wking commented Apr 26, 2016

On Mon, Apr 25, 2016 at 02:01:42PM -0700, Vish Kannan wrote:
“May be it will help if someone can clarify how one can run post-stop
hooks using just the Spec today…”

I'm guessing you mean “run shell commands in the container namespaces
after the container exits” 1. #391 (linked above 2) already
explains how to do things like:

  1. Parent creates a new mount namespace:

    {

    "linux": {
    "namespaces": [
    {"type": "mount"}
    ]
    }
    }

  2. Child joins that mount namespace:

    {

    "linux": {
    "namespaces": [
    {"type": "mount", "path": "/proc/1234/ns/mnt"},

    ]
    }
    }

  3. Child does work and dies.

  4. Extract data from the parent mount namespace with ‘nsenter -m
    /proc/1234/ns/mnt …’ or another child along the lines of step 2.

  5. Kill the parent container.

In that workflow, step 4 is where the post-stop hooks (as you envision
them in this PR) would go.

However, instead of just joining and/or creating namespaces, some
users will probably want to join-and-create (e.g. so both the parent
container and child containers can be PID 1 in their own PID
namespace, with the child's PID namespace descending from the parent's
PID namespace). I don't think join-and-extend is well-supported by
the current spec 3. I have fuzzy memories of previous requests for
that sort of functionality getting responses like “just join the
namespaces and call the runtime from there”, translating to something
like ‘nsenter … runc start …’ [4]. However, it makes life more
difficult for the caller if the namespace paths in their config have
to resolve in the nsenter'ed mount namespace, and the nsenter might
break things like state registration [5,6](e.g. after ‘nsenter … runc
start my-container’, ‘nsenter … runc state my-container’ would
probably work, but a bare ‘runc state my-container’ might not).
Couple this with the joining-order issues 7, and I end up wanting
namespace handling closer to this (tweaking 8):

Namespaces are specified as an array of entries inside the
namespaces root field. The following parameters can be specified to
setup namespaces:

  • type …

  • path …

    If ‘path’ is specified, that particular file (resolved in the
    runtime namespace 9) is used to join that type of namespace.
    Namespaces are created or joined (depending on ‘path’) in the listed
    order.

Although the implementation would be non-trivial and probably involve
a mix of setns and clone, see the caveats for PID, user, and mount
namespaces in setns(2) 10.

With that approach, you could us a parent config like:

{

"linux": {
"namespaces": [
{"type": "pid"},
{"type": "mount"}
}
}
}

to create your parent container, and:

{

"linux": {
"namespaces": [
{"type": "pid", "path": "/proc/1234/ns/pid"},
{"type": "pid"},
{"type": "mount", "path": "/proc/1234/ns/mnt"},
}
}
}

to mean “Join the parent's PID namespace, create a new child PID
namespace, and join the parent's mount namespace”, where the parent is
process 1234 in the runtime PID namespace, and /proc in the runtime
mount namespace belongs to the runtime PID namespace.

[4]: I haven't been able to dig up a reference to an earlier exchange like this though, so maybe I'm just making it up.

 Although wording to about that never landed in the spec.

 Subject: [Linux] Specify namespace-joining order?  (can be important for user namespaces)
 Date: Mon, 18 Apr 2016 21:53:00 -0700
 Message-ID: <[email protected]>

@wking
Copy link
Contributor

wking commented May 10, 2016

I think we can close this in light of last week's meeting 1. The
consensus for supporting this workflow seems to be autoremove (landed
in #384 by @duglin around here 2) or similar. I've argued for “or
similar” 3, but regardless of how that shakes out, the use case
described in this PR will have to be addressed by #384 before it
lands.

@wking wking mentioned this pull request Jun 2, 2016
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Aug 17, 2016
…k-lifecycle

Restore hook language removed by create/start split with some tweaks
to adjust to the new environment, for example:

* Put the pre-start hooks after the 'start' call, but before the meat
  of the start call (the container-process exec trigger).  Folks who
  want a post-create hook can add one with that name.  I'd like to
  have renamed poststop to post-delete to avoid confusion like [1].
  But the motivation for keeping hooks was backwards compatibility [2]
  so I've left the name alone.

* Put each "...command is invoked..." lifecycle entry in its own list
  entry, to match the 'create' list entry.

* Move the rules about what happens on hook failure into the
  lifecycle.  This matches pre-split entries like:

    If any prestart hook fails, then the container MUST be stopped and
    the lifecycle continues at step 7.

  and avoids respecifying that information in a second location
  (config.md).

* I added the warning section to try and follow post-split's generic
  "generates an error" approach while respecting the pre-split desire
  to see what failed (we had "then an error including the exit code
  and the stderr is returned to the caller" and "then an error is
  logged").

[1]: opencontainers#395
     Subject: Run post-stop hooks before the container sandbox is deleted.
[2]: opencontainers#483 (comment)
     Subject: *: Remove hooks

* duglin/SplitCreate:
  Split create and start

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/image-spec that referenced this pull request Sep 16, 2016
Most folks will distribute images containing layers, but the specified
behavior applies cleanly to the layer-less case too.  The unpacked
rootfs will just be an empty directory with unspecified attributes.
Folks might want to use that sort of thing for namespace pinning [1],
distributing configs [2], setting up containers that mount the meat
from the host [2], etc.

[1]: opencontainers/runtime-spec#395 (comment)
[2]: opencontainers#313 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/image-spec that referenced this pull request Sep 16, 2016
Most folks will distribute images containing layers, but the specified
behavior applies cleanly to the layer-less case too.  The unpacked
rootfs will just be an empty directory with unspecified attributes.
Folks might want to use that sort of thing for namespace pinning [1],
distributing configs [2], setting up containers that mount the meat
from the host [2], etc.

[1]: opencontainers/runtime-spec#395 (comment)
[2]: opencontainers#313 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/image-spec that referenced this pull request Sep 16, 2016
Most folks will distribute images containing layers, but the specified
behavior applies cleanly to the layer-less case too.  The unpacked
rootfs will just be an empty directory with unspecified attributes.
Folks might want to use that sort of thing for namespace pinning [1],
distributing configs [2], setting up containers that mount the meat
from the host [2], etc.

[1]: opencontainers/runtime-spec#395 (comment)
[2]: opencontainers#313 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/image-spec that referenced this pull request Oct 1, 2016
Most folks will distribute images containing layers, but the specified
behavior applies cleanly to the layer-less case too.  The unpacked
rootfs will just be an empty directory with unspecified attributes.
Folks might want to use that sort of thing for namespace pinning [1],
distributing configs [2], setting up containers that mount the meat
from the host [2], etc.

[1]: opencontainers/runtime-spec#395 (comment)
[2]: opencontainers#313 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/image-spec that referenced this pull request Oct 1, 2016
Most folks will distribute images containing layers, but the specified
behavior applies cleanly to the layer-less case too.  The unpacked
rootfs will just be an empty directory with unspecified attributes.
Folks might want to use that sort of thing for namespace pinning [1],
distributing configs [2], setting up containers that mount the meat
from the host [2], etc.

[1]: opencontainers/runtime-spec#395 (comment)
[2]: opencontainers#313 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/image-spec that referenced this pull request Oct 6, 2016
Most folks will distribute images containing layers, but the specified
behavior applies cleanly to the layer-less case too.  The unpacked
rootfs will just be an empty directory with unspecified attributes.
Folks might want to use that sort of thing for namespace pinning [1],
distributing configs [2], setting up containers that mount the meat
from the host [2], etc.

[1]: opencontainers/runtime-spec#395 (comment)
[2]: opencontainers#313 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/image-spec that referenced this pull request Oct 6, 2016
Most folks will distribute images containing layers, but the specified
behavior applies cleanly to the layer-less case too.  The unpacked
rootfs will just be an empty directory with unspecified attributes.
Folks might want to use that sort of thing for namespace pinning [1],
distributing configs [2], setting up containers that mount the meat
from the host [2], etc.

[1]: opencontainers/runtime-spec#395 (comment)
[2]: opencontainers#313 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/image-spec that referenced this pull request Oct 20, 2016
Most folks will distribute images containing layers, but the specified
behavior applies cleanly to the layer-less case too.  The unpacked
rootfs will just be an empty directory with unspecified attributes.
Folks might want to use that sort of thing for namespace pinning [1],
distributing configs [2], setting up containers that mount the meat
from the host [2], etc.

[1]: opencontainers/runtime-spec#395 (comment)
[2]: opencontainers#313 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/image-spec that referenced this pull request Oct 20, 2016
Most folks will distribute images containing layers, but the specified
behavior applies cleanly to the layer-less case too.  The unpacked
rootfs will just be an empty directory with unspecified attributes.
Folks might want to use that sort of thing for namespace pinning [1],
distributing configs [2], setting up containers that mount the meat
from the host [2], etc.

[1]: opencontainers/runtime-spec#395 (comment)
[2]: opencontainers#313 (comment)

Signed-off-by: W. Trevor King <[email protected]>
@tianon
Copy link
Member

tianon commented Nov 4, 2016

What's the status on this one? It looks like it's got one approval (@crosbymichael), but there was some discussion following. Is this change still something we should be considering?

@wking
Copy link
Contributor

wking commented Nov 4, 2016

On Fri, Nov 04, 2016 at 03:04:59PM -0700, Tianon Gravi wrote:

What's the status on this one?

The autoremove stuff I referenced earlier 1 never landed, so folks
who want to preserve their mount namespaces past container-process
death are currently back to pinning the mount namespace on their own
before calling ‘start’. I'd floated some exec notes in #391, which
would have allowed folks to use wrapper containers to do the pinning
with subcontainers to do the work. #391 had some positive vibes in
June 2, but runC still has neither required-mount support nor
unprivileged-user support as far as I'm aware ;).

@mrunalp mrunalp self-assigned this Jan 18, 2017
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Feb 3, 2017
I expect the lifecycle information was removed accidentally in
be59415 (Split create and start, 2016-04-01, opencontainers#384), because for a
time it seemed like that PR would also be removing hooks.  Putting the
lifecycle information back in, I made some tweaks to adjust to the new
environment, for example:

* Put the pre-start hooks after the 'start' call, but before the meat
  of the start call (the container-process exec trigger).  Folks who
  want a post-create hook can add one with that name.  I'd like to
  have renamed poststop to post-delete to avoid confusion like [1].
  But the motivation for keeping hooks was backwards compatibility [2]
  so I've left the name alone.

* Put each "...command is invoked..." lifecycle entry in its own list
  entry, to match the 'create' list entry.

* Move the rules about what happens on hook failure into the
  lifecycle.  This matches pre-split entries like:

    If any prestart hook fails, then the container MUST be stopped and
    the lifecycle continues at step 7.

  and avoids respecifying that information in a second location
  (config.md).

* I added the warning section to try and follow post-split's generic
  "generates an error" approach while respecting the pre-split desire
  to see what failed (we had "then an error including the exit code
  and the stderr is returned to the caller" and "then an error is
  logged").

* I left the state 'id' context out, since Michael didn't want it [3].

[1]: opencontainers#395
     Subject: Run post-stop hooks before the container sandbox is deleted.
[2]: opencontainers#483 (comment)
     Subject: *: Remove hooks
[3]: opencontainers#532 (comment)
     Subject: Restore hook language removed by create/start split

Signed-off-by: W. Trevor King <[email protected]>
@mrunalp
Copy link
Contributor

mrunalp commented Feb 27, 2017

@vishh Do we need this?

@vishh
Copy link
Contributor Author

vishh commented Feb 27, 2017

@mrunalp This will definitely be helpful/useful if technically feasible.

wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Mar 1, 2017
I expect the lifecycle information was removed accidentally in
be59415 (Split create and start, 2016-04-01, opencontainers#384), because for a
time it seemed like that PR would also be removing hooks.  Putting the
lifecycle information back in, I made some tweaks to adjust to the new
environment, for example:

* Put the pre-start hooks after the 'start' call, but before the meat
  of the start call (the container-process exec trigger).  Folks who
  want a post-create hook can add one with that name.  I'd like to
  have renamed poststop to post-delete to avoid confusion like [1].
  But the motivation for keeping hooks was backwards compatibility [2]
  so I've left the name alone.

* Put each "...command is invoked..." lifecycle entry in its own list
  entry, to match the 'create' list entry.

* Move the rules about what happens on hook failure into the
  lifecycle.  This matches pre-split entries like:

    If any prestart hook fails, then the container MUST be stopped and
    the lifecycle continues at step 7.

  and avoids respecifying that information in a second location
  (config.md).

* I added the warning section to try and follow post-split's generic
  "generates an error" approach while respecting the pre-split desire
  to see what failed (we had "then an error including the exit code
  and the stderr is returned to the caller" and "then an error is
  logged").

* I left the state 'id' context out, since Michael didn't want it [3].

[1]: opencontainers#395
     Subject: Run post-stop hooks before the container sandbox is deleted.
[2]: opencontainers#483 (comment)
     Subject: *: Remove hooks
[3]: opencontainers#532 (comment)
     Subject: Restore hook language removed by create/start split

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Mar 3, 2017
I expect the lifecycle information was removed accidentally in
be59415 (Split create and start, 2016-04-01, opencontainers#384), because for a
time it seemed like that PR would also be removing hooks.  Putting the
lifecycle information back in, I made some tweaks to adjust to the new
environment, for example:

* Put the pre-start hooks after the 'start' call, but before the meat
  of the start call (the container-process exec trigger).  Folks who
  want a post-create hook can add one with that name.  I'd like to
  have renamed poststop to post-delete to avoid confusion like [1].
  But the motivation for keeping hooks was backwards compatibility [2]
  so I've left the name alone.

* Put each "...command is invoked..." lifecycle entry in its own list
  entry, to match the 'create' list entry.

* Move the rules about what happens on hook failure into the
  lifecycle.  This matches pre-split entries like:

    If any prestart hook fails, then the container MUST be stopped and
    the lifecycle continues at step 7.

  and avoids respecifying that information in a second location
  (config.md).

* I added the warning section to try and follow post-split's generic
  "generates an error" approach while respecting the pre-split desire
  to see what failed (we had "then an error including the exit code
  and the stderr is returned to the caller" and "then an error is
  logged").

* I left the state 'id' context out, since Michael didn't want it [3].

* Make runtime.md references to "generate an error" and "log a
  warning" links, so readers have an easier time finding more detail
  on that wording.

Where I reference a section, I'm still using the auto-generated anchor
for that header and not the anchors which were added in 41839d7 (Merge
pull request opencontainers#707 from mrunalp/anchor_tags, 2017-03-03) and similar.
Mrunal suggested that the manually-added anchors were mainly intended
for the validation tooling [4].

[1]: opencontainers#395
     Subject: Run post-stop hooks before the container sandbox is deleted.
[2]: opencontainers#483 (comment)
     Subject: *: Remove hooks
[3]: opencontainers#532 (comment)
     Subject: Restore hook language removed by create/start split
[4]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2017-03-03.log.html#t2017-03-03T18:02:12

Signed-off-by: W. Trevor King <[email protected]>
@vbatts
Copy link
Member

vbatts commented Mar 8, 2017

We're currently okay with the language in the spec. Closing this issue.
Open a new issue for particular updates needed.

@vbatts vbatts closed this Mar 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants