Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linux.resources.devices and drop linux.devices #98

Closed
wants to merge 2 commits into from

Conversation

wking
Copy link
Contributor

@wking wking commented Aug 6, 2015

Spun off from my suggestions in #94. Details in the committed docs,
commit messages, and #94 discussion.

Copying devices from the runtime host isn't particularly portable, and
it's easy to mount any device nodes you need from the bundle itself
(just like we'd mount any other files needed by the container).

Signed-off-by: W. Trevor King <[email protected]>
@LK4D4
Copy link
Contributor

LK4D4 commented Aug 6, 2015

I'm don't know whether it's bad or good to force people to create devices themselves in bundle. I think all devices could be created by OCI implementation.
ping @crosbymichael @mrunalp @philips

For specifying device cgroups independent of device creation.

I also split the cgroups section into sections for each class (the
earlier docs were very terse).  I'll flesh these sections out in
future commits if the devices addition sounds acceptable.

Signed-off-by: W. Trevor King <[email protected]>
@crosbymichael
Copy link
Member

No I don't think it makes sense to have dev nodes created outside of the runtime

-1

@mrunalp
Copy link
Contributor

mrunalp commented Aug 6, 2015

👎 I think that the runtime should create the device nodes.

@wking
Copy link
Contributor Author

wking commented Aug 6, 2015

On Thu, Aug 06, 2015 at 11:36:04AM -0700, Alexander Morozov wrote:

I'm don't know whether it's bad or good to force people to create
devices themselves in bundle.

I expect most users would be covered by #95 here. Once you're
creating a custom device, it seems like a wash for bundle authors
between “add an entry to the config” and “use mknod or cp”. And it's
easier for implementation authors if they don't have to support a
separate device-creation step.

@wking
Copy link
Contributor Author

wking commented Aug 6, 2015

On Thu, Aug 06, 2015 at 11:40:00AM -0700, Michael Crosby wrote:

No I don't think it makes sense to have dev nodes created outside of
the runtime

Can you explain why?

@crosbymichael
Copy link
Member

because /dev includes much more than just device nodes and we have to set it up correctly. It will also break live migration when we have to migrate and we have fd's leaking in to the container from outside the rootfs.

@wking
Copy link
Contributor Author

wking commented Aug 6, 2015

On Thu, Aug 06, 2015 at 11:47:25AM -0700, Michael Crosby wrote:

because /dev includes much more than just device nodes and we have
to set it up correctly.

So all the stuff under /dev that's not a device node just needs to
show up in the "mounts" array after the mount that supplies /dev.
That doesn't sound like a big deal to me. Or is there some
particularly tricky /dev initialization that can't be handled by
ordering "mounts"?

It will also break live migration when we have to migrate and we
have fd's leaking in to the container from outside the rootfs.

I'm missing this one completely. Is the important distinction rootfs
vs. other mounts? Or are device-node file descriptors particulary
hard to preserve in a way that having runtime-created device nodes
makes easier?

@wking wking mentioned this pull request Aug 6, 2015
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Aug 7, 2015
I'd prefer to handle mknod and device cgroups independently [1,2], to
avoid all this "If path is given..." and "If parameters is given..."
special casing.  But the overloaded approach has landed [3], so this
commit documents the indended semantics [4].  I'm not sure how bundle
authors are supposed to register deny cgroups rules [5].

[1]: opencontainers#98
[2]: opencontainers#99
[3]: opencontainers#94 (comment)
[4]: opencontainers#94 (comment)
[5]: opencontainers#94 (comment)

Signed-off-by: W. Trevor King <[email protected]>
@wking
Copy link
Contributor Author

wking commented Aug 7, 2015

On Thu, Aug 06, 2015 at 11:57:45AM -0700, W. Trevor King wrote:

Thu, Aug 06, 2015 at 11:47:25AM -0700, Michael Crosby:

It will also break live migration when we have to migrate and we
have fd's leaking in to the container from outside the rootfs.

I'm missing this one completely. Is the important distinction
rootfs vs. other mounts? Or are device-node file descriptors
particulary hard to preserve in a way that having runtime-created
device nodes makes easier?

There's a bit more detail on this in IRC today:

10:58 < crosbymichael> most alternate suggestions will work for common
use cases, but when you start throwing user namespace support and
live migration things have to be changed

11:00 < crosbymichael> one bug was that for live migration, a process
had an open reference to /dev/null ( on hte host ) for it's STDIO,
and we have to close that and reopen to the local /dev/null inside
the container's rootfs or live migration does not work because there
are references to things outside the rootfs

11:04 < wking> hmm, I'd missed the host-vs-container /dev/null
distinction. Was it bind-mounted into the container? Why
bind-mount it instead of creating a new device in the container?
11:05 < mrunalp> mknod isn't allowed when user namespaces are enabled.
They have to bind mounted in for user namespace scenario

I don't see why having a rootfs/dev/null in your container wouldn't
work in this “I need /dev/null, but my user namespace doesn't allow me
to call mknod” case. CRIU should be able to handle external mounts
anyway 1, but that seems orthogonal to whether the device is created
by bind-mounting from the host or just already existing in the bundled
filesystem.

The permissions for mknod shouldn't matter, because the bundle author
is calling mknod (or cp) at bundle-author time, which happens before
the runtime is involved at all (and it's the runtime that sets up the
namespaces and control groups).

So I still see no technical reason that would require
config-specified, runtime-initiated mknod calls.

Also on IRC, @LK4D4 pointed out two use cases where config-defined
devices may be more convenient:

a. Running multiple containers on one rootfs, where different
containers may need different devices (e.g. only one container
needs a modem device).
b. Running one config on multiple rootfs.

For, (a) I'd probably put the modem device in the rootfs and just
ignore it (potentially adding a restrictive cgroups rule to deny it)
in the containers that didn't need it.

For (b), I'd just create all the device nodes, which might be
scriptable with something like:

$ find . -name dev -type d -execdir mknod …

But still, these cases would be more conveniently handled by
config-side device creation. However, I think making these use cases
easier isn't worth the complication of overloading mknod and cgroups
handling in the same structure (see all the conditionals in #101).

I'm less sure of the balance between these cases and a complication of
a mknod-specific section in the config. #99 is similar to this PR,
but it keeps the mknod-oriented linux.devices. If @LK4D4's use cases
are worth the spec/implementation trouble, then we probably want to
keep some form of mknod support in the spec, and on that front I'm
agnostic between the “copy from host” semantics we had before #94 and
the “specify everything mknod needs” semantics that landed with #94.

@wking
Copy link
Contributor Author

wking commented Aug 8, 2015

On Fri, Aug 07, 2015 at 12:31:39PM -0700, W. Trevor King wrote:

So I still see no technical reason that would require
config-specified, runtime-initiated mknod calls.

But still, these cases would be more conveniently handled by
config-side device creation… I'm less sure of the balance between
these cases and a complication of a mknod-specific section in the
config.

After thinking this over this afternoon, I think dropping mknod
handling from the config is the right approach (i.e. I prefer #98 to
#99 or #94). The same use cases where @LK4D4 sees the convenience of
a spec-side mknod would also apply to any filesystem-side object, and
I think we can all agree that we don't want things like:

"files": [
{
"path": "/root/.bashrc",
"uid": 0,
"gid": 0,
"fileMode": 0644,
"content": "export HISTCONTROL=ignoreboth\n",
}
]

in the spec. So I'd rather just say “things that can be represented
in bundle filesystems do not belong in the spec”, and leave it at
that. I agree that we need cgroups device support (which was the
initial motivation for #94 1), and the approach taken in this PR
gives us that without the mknod cruft.

wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Jan 12, 2016
With mknod entries in linux.devices and cgroups entries in
linux.resources.devices.  Background discussion in [1].

For specifying device cgroups independent of device creation.  This
makes it easy to distinguish between configs that call for cgroup
adjustments (which have linux.resources entries) from those that
don't.  Without this split, folks interested in making that
distinction would have to parse the device section to determine if it
included cgroup changes.  This will also make it easy to drop either
portion (mknod [2] or cgroups [3]) independently of the other if the
project decides to do so.

Using seperate sections for mknod and cgroups also allows us to avoid
the complicated validation rules needed for the combined format
mknod/cgroup [4].

Now that there is a section specific to supplying devices, I shifted
the default device listing over from config-linux [5].  The /dev/ptmx
entry is a bit awkward, since it's not a device, but it seemed to fit
better over here.  But I would also be fine leaving it with the other
mounts in config-linux.

The reference links are sorted into two blocks, with kernel-doc links
sorted alphabetically followed by man pages sorted alphabetically by
section.

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/y_Fsa2_jJaM
     Subject: Separate config entries for device mknod and cgroups?
     Date: Mon, 5 Oct 2015 12:46:55 -0700
     Message-ID: <[email protected]>
[2]: opencontainers#98
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk
     Subject: removal of cgroups from the OCI Linux spec
     Date: Wed, 28 Oct 2015 17:01:59 +0000
     Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com>
[4]: opencontainers#101
[5]: opencontainers#171 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Jan 20, 2016
With mknod entries in linux.devices and cgroups entries in
linux.resources.devices.  Background discussion in [1].

For specifying device cgroups independent of device creation.  This
makes it easy to distinguish between configs that call for cgroup
adjustments (which have linux.resources entries) from those that
don't.  Without this split, folks interested in making that
distinction would have to parse the device section to determine if it
included cgroup changes.  This will also make it easy to drop either
portion (mknod [2] or cgroups [3]) independently of the other if the
project decides to do so.

Using seperate sections for mknod and cgroups also allows us to avoid
the complicated validation rules needed for the combined format
mknod/cgroup [4].

Now that there is a section specific to supplying devices, I shifted
the default device listing over from config-linux [5].  The /dev/ptmx
entry is a bit awkward, since it's not a device, but it seemed to fit
better over here.  But I would also be fine leaving it with the other
mounts in config-linux.

The reference links are sorted into two blocks, with kernel-doc links
sorted alphabetically followed by man pages sorted alphabetically by
section.

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/y_Fsa2_jJaM
     Subject: Separate config entries for device mknod and cgroups?
     Date: Mon, 5 Oct 2015 12:46:55 -0700
     Message-ID: <[email protected]>
[2]: opencontainers#98
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk
     Subject: removal of cgroups from the OCI Linux spec
     Date: Wed, 28 Oct 2015 17:01:59 +0000
     Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com>
[4]: opencontainers#101
[5]: opencontainers#171 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Jan 27, 2016
With mknod entries in linux.devices and cgroups entries in
linux.resources.devices.  Background discussion in [1].

For specifying device cgroups independent of device creation.  This
makes it easy to distinguish between configs that call for cgroup
adjustments (which have linux.resources entries) from those that
don't.  Without this split, folks interested in making that
distinction would have to parse the device section to determine if it
included cgroup changes.  This will also make it easy to drop either
portion (mknod [2] or cgroups [3]) independently of the other if the
project decides to do so.

Using seperate sections for mknod and cgroups also allows us to avoid
the complicated validation rules needed for the combined format
mknod/cgroup [4].

Now that there is a section specific to supplying devices, I shifted
the default device listing over from config-linux [5].  The /dev/ptmx
entry is a bit awkward, since it's not a device, but it seemed to fit
better over here.  But I would also be fine leaving it with the other
mounts in config-linux.

The reference links are sorted into two blocks, with kernel-doc links
sorted alphabetically followed by man pages sorted alphabetically by
section.

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/y_Fsa2_jJaM
     Subject: Separate config entries for device mknod and cgroups?
     Date: Mon, 5 Oct 2015 12:46:55 -0700
     Message-ID: <[email protected]>
[2]: opencontainers#98
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk
     Subject: removal of cgroups from the OCI Linux spec
     Date: Wed, 28 Oct 2015 17:01:59 +0000
     Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com>
[4]: opencontainers#101
[5]: opencontainers#171 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Jan 27, 2016
With mknod entries in linux.devices and cgroups entries in
linux.resources.devices.  Background discussion in [1].

For specifying device cgroups independent of device creation.  This
makes it easy to distinguish between configs that call for cgroup
adjustments (which have linux.resources entries) from those that
don't.  Without this split, folks interested in making that
distinction would have to parse the device section to determine if it
included cgroup changes.  This will also make it easy to drop either
portion (mknod [2] or cgroups [3]) independently of the other if the
project decides to do so.

Using seperate sections for mknod and cgroups also allows us to avoid
the complicated validation rules needed for the combined format
mknod/cgroup [4].

Now that there is a section specific to supplying devices, I shifted
the default device listing over from config-linux [5].  The /dev/ptmx
entry is a bit awkward, since it's not a device, but it seemed to fit
better over here.  But I would also be fine leaving it with the other
mounts in config-linux.

The reference links are sorted into two blocks, with kernel-doc links
sorted alphabetically followed by man pages sorted alphabetically by
section.

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/y_Fsa2_jJaM
     Subject: Separate config entries for device mknod and cgroups?
     Date: Mon, 5 Oct 2015 12:46:55 -0700
     Message-ID: <[email protected]>
[2]: opencontainers#98
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk
     Subject: removal of cgroups from the OCI Linux spec
     Date: Wed, 28 Oct 2015 17:01:59 +0000
     Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com>
[4]: opencontainers#101
[5]: opencontainers#171 (comment)

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Jan 27, 2016
With mknod entries in linux.devices and cgroups entries in
linux.resources.devices.  Background discussion in [1].

For specifying device cgroups independent of device creation.  This
makes it easy to distinguish between configs that call for cgroup
adjustments (which have linux.resources entries) from those that
don't.  Without this split, folks interested in making that
distinction would have to parse the device section to determine if it
included cgroup changes.  This will also make it easy to drop either
portion (mknod [2] or cgroups [3]) independently of the other if the
project decides to do so.

Using seperate sections for mknod and cgroups also allows us to avoid
the complicated validation rules needed for the combined format
mknod/cgroup [4].

Now that there is a section specific to supplying devices, I shifted
the default device listing over from config-linux [5].  The /dev/ptmx
entry is a bit awkward, since it's not a device, but it seemed to fit
better over here.  But I would also be fine leaving it with the other
mounts in config-linux.

The reference links are sorted into two blocks, with kernel-doc links
sorted alphabetically followed by man pages sorted alphabetically by
section.  The cgroup link is new since 2016-01-13 [6].

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/y_Fsa2_jJaM
     Subject: Separate config entries for device mknod and cgroups?
     Date: Mon, 5 Oct 2015 12:46:55 -0700
     Message-ID: <[email protected]>
[2]: opencontainers#98
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk
     Subject: removal of cgroups from the OCI Linux spec
     Date: Wed, 28 Oct 2015 17:01:59 +0000
     Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com>
[4]: opencontainers#101
[5]: opencontainers#171 (comment)
[6]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34a9304a96d6351c2d35dcdc9293258378fc0bd8

Signed-off-by: W. Trevor King <[email protected]>
wking added a commit to wking/opencontainer-runtime-spec that referenced this pull request Jan 27, 2016
With mknod entries in linux.devices and cgroups entries in
linux.resources.devices.  Background discussion in [1].

For specifying device cgroups independent of device creation.  This
makes it easy to distinguish between configs that call for cgroup
adjustments (which have linux.resources entries) from those that
don't.  Without this split, folks interested in making that
distinction would have to parse the device section to determine if it
included cgroup changes.  This will also make it easy to drop either
portion (mknod [2] or cgroups [3]) independently of the other if the
project decides to do so.

Using seperate sections for mknod and cgroups also allows us to avoid
the complicated validation rules needed for the combined format
mknod/cgroup [4].

Now that there is a section specific to supplying devices, I shifted
the default device listing over from config-linux [5].  The /dev/ptmx
entry is a bit awkward, since it's not a device, but it seemed to fit
better over here.  But I would also be fine leaving it with the other
mounts in config-linux.

fileMode, uid, and gid are optional, because mknod(2) doesn't need
them and specifies the handling when they aren't set [6,7].
Similarly, major/minor numbers are only required for S_IFCHR and
S_IFBLK [6].  I've left off wording about required runtime behavior
for unset values, because I'd rather address that with a blanket rule
[8].

For the cgroup, access is optional because the kernel docs show an
example that doesn't write an access field to the devices.deny file
[9].  The current kernel docs don't go into much detail on this
behavior (I expect unset and 'rwm' are equivalent), but if the kernel
doesn't need a value written, the spec should get out of the way and
allow users to not specify a value.

The reference links are sorted into two blocks, with kernel-doc links
sorted alphabetically followed by man pages sorted alphabetically by
section.  The cgroup link is new since 2016-01-13 [10].

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/y_Fsa2_jJaM
     Subject: Separate config entries for device mknod and cgroups?
     Date: Mon, 5 Oct 2015 12:46:55 -0700
     Message-ID: <[email protected]>
[2]: opencontainers#98
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk
     Subject: removal of cgroups from the OCI Linux spec
     Date: Wed, 28 Oct 2015 17:01:59 +0000
     Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com>
[4]: opencontainers#101
[5]: opencontainers#171 (comment)
[6]: http://man7.org/linux/man-pages/man2/mknod.2.html#DESCRIPTION
[7]: https://github.com/opencontainers/specs/pull/298/files#r51053835
[8]: opencontainers#285 (comment)
[9]: https://kernel.org/doc/Documentation/cgroup-v1/devices.txt
[10]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34a9304a96d6351c2d35dcdc9293258378fc0bd8

Signed-off-by: W. Trevor King <[email protected]>
Mashimiao pushed a commit to Mashimiao/specs that referenced this pull request Aug 19, 2016
With mknod entries in linux.devices and cgroups entries in
linux.resources.devices.  Background discussion in [1].

For specifying device cgroups independent of device creation.  This
makes it easy to distinguish between configs that call for cgroup
adjustments (which have linux.resources entries) from those that
don't.  Without this split, folks interested in making that
distinction would have to parse the device section to determine if it
included cgroup changes.  This will also make it easy to drop either
portion (mknod [2] or cgroups [3]) independently of the other if the
project decides to do so.

Using seperate sections for mknod and cgroups also allows us to avoid
the complicated validation rules needed for the combined format
mknod/cgroup [4].

Now that there is a section specific to supplying devices, I shifted
the default device listing over from config-linux [5].  The /dev/ptmx
entry is a bit awkward, since it's not a device, but it seemed to fit
better over here.  But I would also be fine leaving it with the other
mounts in config-linux.

fileMode, uid, and gid are optional, because mknod(2) doesn't need
them and specifies the handling when they aren't set [6,7].
Similarly, major/minor numbers are only required for S_IFCHR and
S_IFBLK [6].  I've left off wording about required runtime behavior
for unset values, because I'd rather address that with a blanket rule
[8].

For the cgroup, access is optional because the kernel docs show an
example that doesn't write an access field to the devices.deny file
[9].  The current kernel docs don't go into much detail on this
behavior (I expect unset and 'rwm' are equivalent), but if the kernel
doesn't need a value written, the spec should get out of the way and
allow users to not specify a value.

The reference links are sorted into two blocks, with kernel-doc links
sorted alphabetically followed by man pages sorted alphabetically by
section.  The cgroup link is new since 2016-01-13 [10].

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/y_Fsa2_jJaM
     Subject: Separate config entries for device mknod and cgroups?
     Date: Mon, 5 Oct 2015 12:46:55 -0700
     Message-ID: <[email protected]>
[2]: opencontainers#98
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk
     Subject: removal of cgroups from the OCI Linux spec
     Date: Wed, 28 Oct 2015 17:01:59 +0000
     Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com>
[4]: opencontainers#101
[5]: opencontainers#171 (comment)
[6]: http://man7.org/linux/man-pages/man2/mknod.2.html#DESCRIPTION
[7]: https://github.com/opencontainers/specs/pull/298/files#r51053835
[8]: opencontainers#285 (comment)
[9]: https://kernel.org/doc/Documentation/cgroup-v1/devices.txt
[10]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34a9304a96d6351c2d35dcdc9293258378fc0bd8

Signed-off-by: W. Trevor King <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants