Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linux: specify the default devices/filesystems available #95

Closed
philips opened this issue Aug 5, 2015 · 25 comments
Closed

linux: specify the default devices/filesystems available #95

philips opened this issue Aug 5, 2015 · 25 comments

Comments

@philips
Copy link
Contributor

philips commented Aug 5, 2015

Linux applications rely on a number of devices and filesystems. Lets define a default set, what do people think of this set lifted from the appc OS-SPEC?

The following devices and filesystems MUST be made available in each application's filesystem

Path Type Notes
/proc procfs
/sys sysfs
/dev/null device
/dev/zero device
/dev/full device
/dev/random device
/dev/urandom device
/dev/tty device
/dev/console device
/dev/pts devpts
/dev/ptmx device Bind-mount or symlink of /dev/pts/ptmx
/dev/shm tmpfs
@wking
Copy link
Contributor

wking commented Aug 5, 2015

On Wed, Aug 05, 2015 at 03:45:06PM -0700, Brandon Philips wrote:

The following devices and filesystems MUST be made available in each
application's filesystem

I don't have a problem with automatically supplying these, as long as
explicit entries in the mounts or devices arrays can override the
compiled-in defaults. That doesn't leave you with a way to say “I
don't want anything mounted at /dev/shm” (for example), but that's
probably not a big deal.

@mrunalp
Copy link
Contributor

mrunalp commented Aug 5, 2015

/dev/shm could be optional I think. Also, it could be bind mounted in.

@philips
Copy link
Contributor Author

philips commented Aug 6, 2015

@mrunalp We considered making it optional in appc but I don't think it hurts anything to have it.

@LK4D4
Copy link
Contributor

LK4D4 commented Aug 6, 2015

Makes sense.
Also we need to specify order in which bind-mounts and devs created, because it can lead to pretty different behavior.

@philips
Copy link
Contributor Author

philips commented Aug 6, 2015

@LK4D4 I agree, that is a separate issue though, right? Or are you saying what happens if someone decides to do a bindmount for /dev from a host filesystem?

@LK4D4
Copy link
Contributor

LK4D4 commented Aug 6, 2015

@philips That is issue too, but I meant per-device. You're right it's another issue.

@mrunalp
Copy link
Contributor

mrunalp commented Aug 6, 2015

@philips Yeah, i see no harm in requiring /dev/shm.

@dqminh
Copy link
Contributor

dqminh commented Aug 7, 2015

I think having these as default makes sense.
However should we also allow the bundle author to deny some devices ? For example, i may not want tty related devices in the container. Maybe that's a separate issue.

@wking
Copy link
Contributor

wking commented Aug 7, 2015

On Fri, Aug 07, 2015 at 03:23:00AM -0700, Daniel, Dao Quang Minh wrote:

However should we also allow the bundle author to deny some devices
? For example, i may not want tty related devices in the
container. Maybe that's a separate issue.

Why would you not want tty devices? If a container is not attached to
a terminal, trying to open /dev/tty will raise an error 1. I don't
see a need to avoid having the /dev/tty device node though.

@vbatts
Copy link
Member

vbatts commented Aug 7, 2015

this looks sane to me

On Fri, Aug 7, 2015 at 12:50 PM, W. Trevor King [email protected]
wrote:

On Fri, Aug 07, 2015 at 03:23:00AM -0700, Daniel, Dao Quang Minh wrote:

However should we also allow the bundle author to deny some devices
? For example, i may not want tty related devices in the
container. Maybe that's a separate issue.

Why would you not want tty devices? If a container is not attached to
a terminal, trying to open /dev/tty will raise an error 1. I don't
see a need to avoid having the /dev/tty device node though.


Reply to this email directly or view it on GitHub
#95 (comment)
.

@philips philips changed the title linux: specify the default devices available linux: specify the default devices/filesystems available Aug 11, 2015
@mrunalp
Copy link
Contributor

mrunalp commented Aug 12, 2015

Maybe also include /dev/mqueue?

@wking
Copy link
Contributor

wking commented Aug 12, 2015

On Wed, Aug 12, 2015 at 10:51:28AM -0700, Mrunal Patel wrote:

Maybe also include /dev/mqueue?

It would be nice if inclusions like this all came with links to specs
like POSIX or the LFH to motivate our including them as part of a
“standard Linux system”.

@stuartpb
Copy link

Yeah, the way I was thinking of doing this would be as a key like "standard_mounts", with values like "lfh-0.65", to specify which hierarchy you're using (since there are always new ways to do it coming out, and this would let container specs adapt to the new standards).

@philips
Copy link
Contributor Author

philips commented Aug 14, 2015

@wking @stuartpb Do you have a link to such a spec? When we were doing this for appc we couldn't find anything that was both kept up to date and authoritative. There is just the general best practices which the above reflects.

@wking
Copy link
Contributor

wking commented Aug 14, 2015

On Fri, Aug 14, 2015 at 03:42:47PM -0700, Stuart P. Bentley wrote:

Yeah, the way I was thinking of doing this would be as a key like
"standard_mounts", with values like "lfh-0.65", to specify
which hierarchy you're using (since there are always new ways to
do it, and new best practices arising).

Building in a way to gracefully migrate between different “default
system” specs is nice, but for me this tips the balance between “makes
bundle author's lives easier” (which is what these defaults are about)
and “makes implentor's lives more difficult” ;).

Could we spin this off into a separate bundle-authoring helper?
E.g. “give me an OCI spec devices section for POSIX 2013” (which would
be /dev/console, /dev/null, and /dev/tty 1) or “give me an OCI spec
devices section like Debian 7.0” (probably a superset of POSIX ;).

@philips
Copy link
Contributor Author

philips commented Aug 14, 2015

These devices and filesystems are part of the Linux ABI. What problems are solved by flexibility and how would this flexibility help runtime implementers?

Also, these aren't parts of the bundle, they are things that should be created or mounted at runtime.

@wking
Copy link
Contributor

wking commented Aug 14, 2015

On Fri, Aug 14, 2015 at 03:52:36PM -0700, Brandon Philips wrote:

@wking @stuartpb Do you have a link to such a spec?

The POSIX spec I linked above has three devices. I imagine we can
find references of some sort for /dev/shm, /dev/mqueue, etc., but I
don't have them offhand.

As an alternative for Linux, we could just mount a devtmpfs. On Linux
4.1.0, that gives me null, zero, full, random, urandom, tty, console,
pts, ptmx, shm, mqueue, but does not mount a devpts on pts, a tmpfs on
shm, or a mqueue on mqueue. There will probably be a bunch of other
devices as well, and the presence of all entries should reflect
whether or not they're compiled into the host kernel.

@stuartpb
Copy link

There will probably be a bunch of other devices as well, and the presence of all entries should reflect whether or not they're compiled into the host kernel.

Uh, making the entries in mounts specific to what's compiled into the host kernel seems to run directly counter to https://github.com/opencontainers/specs#3-infrastructure-agnostic.

@philips
Copy link
Contributor Author

philips commented Aug 14, 2015

Things we mount or create should not depend on whether the kernel has or doesn't have a particular feature. These should all be default ABI things.

@stuartpb
Copy link

Building in a way to gracefully migrate between different “default system” specs is nice, but for me this tips the balance between “makes bundle author's lives easier” (which is what these defaults are about) and “makes implentor's lives more difficult” ;).

If you're being serious here, you're either vastly overestimating the difficulty of maintaining the list of default system specs, or you're vastly underestimating the importance of this to the use cases.

In terms of implementation difficulty, there are going to be, like, two standard mount lists at the start (the implementation overhead to use all of which will be as difficult to implement as typing mounts.concat(standardMounts[name])), and then one every other year, tops. Considering you're using three-part semantic versioning to track your spec (which asserts that you intend to release non-breaking additions), that's going to be the least part of your implementers' troubles in keeping up with the iteration cycle.

In terms of the use improvements, it's not just a "writing this is hard" tooling problem - making this part of the spec rather than a mass of boilerplate each manifest has to carry with it means cutting out over half of the space each config takes up in the filesystem. Considering that a system where manifests correspond to each running container rather than only images appears to be the target use case (which, btw, is its own whole class of mistake, as I'm trying to communicate around opencontainers/runc#200), this is a savings that would add up in a big way in production.

@wking
Copy link
Contributor

wking commented Aug 15, 2015

On Fri, Aug 14, 2015 at 04:28:26PM -0700, Brandon Philips wrote:

Things we mount or create should not depend on whether the kernel
has or doesn't have a particular feature. These should all be
default ABI things.

“Linux” isn't specific enough to define a particular ABI. For
example, if I build a kernel without CONFIG_TMPFS, I don't expect runC
to try and mount a tmpfs on /dev/shm. So instead of trying to track
standards that specify devices (since POSIX 2013 is so sparse here), I
think it's more convenient and useful to just mount a devtmpfs and
expose all the devices that the kernel knows about 1.

If we do want to use an external standard for /dev, we have my earlier
POSIX 2013 (/dev/console, /dev/null, and /dev/tty 1). The LSB 5.0
punts to the FHS 2. And the FHS 3.0 has /dev/null, /dev/tty, and
/dev/zero 3, which matches the FHS 2.3 4. I haven't found
anything like a Linux ABI spec that talks about /dev/random, /dev/shm,
etc.

Hard-coding a list (or lists) of “standard” devices seems like extra
work and risk confusing errors on kernels that aren't compiled to
support some of those devices, but which may be completely capable of
running the bundle in question. So in the absence of a good standard
that requires the devices I need, I'd rather just mount devtmpfs and
leave “including the devices I need” to whoever's configuring the
kernel (since they'll have to do that regardless of what we do in a
runtime implementation).

@wking
Copy link
Contributor

wking commented Aug 15, 2015

On Fri, Aug 14, 2015 at 04:41:50PM -0700, Stuart P. Bentley wrote:

In terms of the use improvements, it's not just a "writing this is
hard" tooling problem - making this part of the spec rather than a
mass of boilerplate each manifest has to carry with it means cutting
out over half of the space each config takes up in the filesystem.

Cutting my current spec from 3 KB to 1.5 KB is not going to have a
significant impact on my bundle size ;).

@timthelion
Copy link
Contributor

I really don't understand what all this fuss is about. Docker has a short list of devices that it either automatically creates in the container, or allows the container to create. They are listed here https://github.com/docker/libcontainer/blob/ce1f2f1c86cda9ce335c16f3638206ceb97174bd/configs/device_defaults.go . It is one of the least interesting things about Docker, there have been really no arguments / drama what-so-ever over whether that list should be extended or reduced. Especially since anyone can add to the list at runtime.

However, there is a good reason for such a list to explicitly exist. It makes it very easy to do a security audit of Docker and say "what attack surface am I exposing via device nodes within the container?"

@vbatts
Copy link
Member

vbatts commented Aug 27, 2015

@timthelion subtly, you just said docker, but referenced libcontainer. Libcontainer is what runc now is.
This spec is about driving basic expectations for a container environment. So on linux, that means some basic kernel features and setup. As there will be a validation of runtime environments, like runc will provide, that expectation will have to be defined and asserted. It not fuss, just defining things that have been assumptions in the docker ecosystem.

@wking
Copy link
Contributor

wking commented Oct 22, 2016

On Wed, Aug 12, 2015 at 11:27:01AM -0700, W. Trevor King wrote:

It would be nice if inclusions like this all came with links to specs like POSIX or the LFH to motivate our including them as part of a “standard Linux system”.

This issue has been closed for a while, but work on #518 turned up the systemd container interface, which requires everything from the initial post except /dev/pts and /dev/shm. So I'm not sure what the motivation for those was, but “can run systemd” is reasonable motivation for the others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants