Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use /etc/passwd in place of the User struct #191

Closed
wants to merge 1 commit into from

Conversation

laijs
Copy link
Contributor

@laijs laijs commented Sep 16, 2015

The type of user becomes string instead of platform-specific
structure, so the Spec becomes a platform-agnostic structure.

It makes we can parse the config.json as a Spec before knowing the
os. It removes the User structure confliction for different platforms.

For linux-based container, parsing the /etc/passwd gives us a consistent
and more reliable way to assign the uid/gids to the process.

Signed-off-by: Lai Jiangshan [email protected]

The type of `user` becomes `string` instead of platform-specific
structure, so the Spec becomes a platform-agnostic structure.

It makes we can parse the config.json as a Spec before knowing the
os. It removes the User structure confliction for different platforms.

For linux-based container, parsing the /etc/passwd gives us a consistent
and more reliable way to assign the uid/gids to the process.

Signed-off-by: Lai Jiangshan <[email protected]>
@julz
Copy link
Contributor

julz commented Sep 16, 2015

yes, please :-)

see also #38

@laijs
Copy link
Contributor Author

laijs commented Sep 16, 2015

@glyn
Copy link

glyn commented Sep 16, 2015

Seems reasonable to me.

@mrunalp
Copy link
Contributor

mrunalp commented Sep 16, 2015

We can discuss this in today's call.

* **uid** (int, required) specifies the user id.
* **gid** (int, required) specifies the group id.
* **additionalGids** (array of ints, optional) specifies additional group ids to be added to the process.
* **user** (string, required) is the user name for the process. The runtime should resolve the required information for the process by its platform-specific ways. For Linux-based containers, the runtime could parse the /etc/passwd and /etc/group inside the rootfs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“could parse /etc/passwd and /etc/group” is too wishy. A bundle author will need to know exactly how this lookup will be performed, so they can setup their bundle to support it. How about using the appc wording referenced from #38?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getent is really the only correct way to handle this. Perhaps we just say if you want a UID map you can put that into your config.json similar to the mount points? https://github.com/opencontainers/specs/blob/master/config.md#mount-points

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Dec 22, 2015 at 11:07:42PM -0800, Brandon Philips wrote:

+* user (string, required) is the user name for the
process. The runtime should resolve the required information for
the process by its platform-specific ways. For Linux-based
containers, the runtime could parse the /etc/passwd and /etc/group
inside the rootfs.

getent is really the only correct way to handle this.

I don't buy that ;). For example, I don't see much difference
between:

Perhaps we just say if you want a UID map you can put that into your
config.json similar to the mount points?
https://github.com/opencontainers/specs/blob/master/config.md#mount-points

and defining the mappings in /etc/passwd and /etc/group. I think you
cover a useful subset of NSS implementations by supporting /etc/passwd
and /etc/group, and you allow runtimes to avoid running container-side
code to perform the lookup (for example, they can use a host-side NSS
implementation to parse the bundle's /etc/passwd and /etc/group).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should let random files inside of the container affect behavior of the runtime. And this is what /etc/passwd parsing does. It lets the contents inside of the container control the UID the runtime executes processes as.

We could trivially create a tool that extracts the /etc/passwd into a map and puts it into the JSON schema. Thoughts @jonboulle @vbatts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Dec 23, 2015 at 11:19:16AM -0800, Brandon Philips wrote:

+* user (string, required) is the user name for the
process. The runtime should resolve the required information for
the process by its platform-specific ways. For Linux-based
containers, the runtime could parse the /etc/passwd and
/etc/group inside the rootfs.

I don't think we should let random files inside of the container
affect behavior of the runtime. And this is what /etc/passwd parsing
does. It lets the contents inside of the container control the UID
the runtime executes processes as.

I don't see a point in distinguishing between “inside the container”
(just /etc/passwd) and “inside the bundle” (both /etc/passwd and the
OCI configs). What do you gain by making such a distinction?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /etc/passwd may be modified at runtime by processes executing inside of that filesystem while the manifest is inaccessible. So, if we have operations like "launch another process" the behavior may change based on the new state of that file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Wed, Dec 23, 2015 at 11:42:49AM -0800, Brandon Philips wrote:

+* user (string, required) is the user name for the
process. The runtime should resolve the required information for
the process by its platform-specific ways. For Linux-based
containers, the runtime could parse the /etc/passwd and
/etc/group inside the rootfs.

The /etc/passwd may be modified at runtime by processes executing
inside of that filesystem while the manifest is inaccessible. So, if
we have operations like "launch another process" the behavior may
change based on the new state of that file.

Yeah, this lookup should happen in the container process after joining
the namespace but before exec'ing the user-configured code. So if
/etc/passwd changes and you launch a new process, you get the new UID.
I don't have a problem with that (it seems like what the user intended
when they modified /etc/passwd).

@wking
Copy link
Contributor

wking commented Sep 16, 2015

On Wed, Sep 16, 2015 at 02:29:49AM -0700, Lai Jiangshan wrote:

The type of user becomes string instead of platform-specific
structure, so the Spec becomes a platform-agnostic structure.

appc uses strings but keeps separate fields for user, group, and
additional-gids (although they still use integers for this last one,
see #38). I'm ok with strings, but if we use a single string we'll
want to define the syntax for packing each of these fields in
(‘[:[:…]]’?) and make sure we're
explicit about how we expect the lookup to work (see appc's text
linked from #38 for an example).

@laijs
Copy link
Contributor Author

laijs commented Sep 17, 2015

I just noticed it just reverts the 8af3eb6. @crosbymichael

From the #10, I learned something bad about the pure string name.

@laijs
Copy link
Contributor Author

laijs commented Sep 17, 2015

@wking suggested to use the structured-and-rich-information-encoded name as AppC used.
But a structured-and-rich-information-encoded-string is not better than json.RawMessage IMO. (see the 44b4415 in #135), I don't want to reinvent the wheel.

Could any maintainer directly give an approach?

I consider that mapping the User struct from the username is a good idea. Both the username and the map are only valid in the config file, so there is no security problem. And we can benefit from the User struct as current code.

@wking
Copy link
Contributor

wking commented Sep 17, 2015

On Thu, Sep 17, 2015 at 01:01:25AM -0700, Lai Jiangshan wrote:

@wking suggested use the structured-and-rich-information-encoded
name as AppC used.

I think there are two things going on here:

  1. Do we want to support name → ID lookup?
  2. Do we want structured fields for user, group, and additional-group
    information?

I think most folks are on board with (1). In yesterday's meeting
(minutes in 1, but not yet posted to the wiki), I think the
consensus was that we want to keep explicit JSON fields (2). And we
want to move a step beyond appc's overloaded string (“If no valid
matches are found, then if the string is all numerical, it shall be
converted to an integer and used as the UID/GID” and they have a path
fallback in there too) and have explicit fields for the different
lookup types. So something like:

  • uid (int, optional) specifies the user ID.
  • username (string, optional) specifies the user name.
    After entering the container, the runtime process will parse /etc/passwd to find the associated ID before exec'ing the application.
    If the name is not in /etc/passwd, the runtime process will print a message to its stderr, initiate the teardown procedures, and return a non-zero exit code.
    A configuration is invalid if both uid and username are set.
    If neither are set, the runtime must use UID 0.
  • gid (int, optional) specifies the group ID.
  • groupname (string, optional) specifies the group name.
    The logic is the same as for uid and username, except the groupname lookup will use /etc/group.
  • additionalGids (array of ints, optional) specifies additional group IDs to be added to the process.
  • additionalGroupNames (array of strings, optional) specifies additional group names to be added to the process.
    The logic is the same as for gid and groupname, except that specifying both additionalGids and additionalGroupNames is valid.
    If both are specified, the union of group IDs will be used for the application.

We can add similar fields for userPath and groupPath if we want later
on.

Could any maintainer directly give a approach?

Waiting for approval from a maintainer is a good idea. Sometimes what
I think I heard in a meeting doesn't turn out to be what they heard
;). That's probably unavoidable when we're agreeing without some
explicit text to make sure we're talking about the same thing.

Both the username and the map are only valid in the config file, so
there is no security problem.

The security problem is if you call getent in the container before
dropping privileges 3, since that would execute bundle-supplied
code. Using runtime-supplied code to read /etc/passwd and /etc/groups
to perform the lookup is safe (and after entering the container you've
already setup blkio rate limiting etc., so a huge /etc/passwd won't
even hog your disk bandwidth).

@crosbymichael
Copy link
Member

@philips can you check this out and see if it's inline with what you were thinking on your original issue?

@julz
Copy link
Contributor

julz commented Dec 3, 2015

@philips @crosbymichael @vbatts looks like this issue stalled a few months ago, we're still maintaining a fork so we can run our tests on mac and would love to be able to stop doing that. There's also #166 which seems like a more generic approach to making the spec importable on multiple platforms, but it also hasn't been updated since September. In the meantime just making User a string as in this PR (and documenting the semantics) looks like it solves the problem. Any new thoughts about this?

@laijs
Copy link
Contributor Author

laijs commented Dec 4, 2015

I don't think #166 is feasible in short time.
I didn't expect it has been consuming so much time without any progress.
The runv and the hyper have to use a simple trick (rename) to handle it on mac currently.

https://github.com/hyperhq/runv/blob/master/Makefile.am#L19-L23
https://github.com/hyperhq/hyper/blob/master/Makefile.am#L26-L30

besides this issue, any better trick for the hyper/runv are also welcome.

@hqhq
Copy link
Contributor

hqhq commented Dec 4, 2015

@laijs Haven't read the whole story, but I see the problem is you can still see User struct on MAC which defined in config_linux.go? Is adding a build flag in config_linux.go gonna fix your problem?

Ignore it if I'm missing something big and totally wrong :)

@wking
Copy link
Contributor

wking commented Dec 4, 2015

On Fri, Dec 04, 2015 at 01:35:34AM -0800, Qiang Huang wrote:

@laijs Haven't read the whole story, but I see the problem is you
can still see User struct on MAC which defined in
config_linux.go? Is adding a build flag in config_linux.go gonna
fix your problem?

Hmm, this^ sounds more complicated to me. In the absence of native
containerization (e.g. with the runV approach), there's not a clear
link between the host OS (which sets the build flags) and the launched
container (where process.user will be applied, although how that
information is passed to the guest machine is beyond me). So whatever
structure you build into the runtime will be wrong for some bundles.
I think the only solution is build a runtime that can support all
bundle (e.g. #185, which is in the roadmap from #230, so it's
hopefully not too far out).

@laijs
Copy link
Contributor Author

laijs commented Dec 7, 2015

@hqhq the go build ignores the xxxx_linux.go in the mac. (the build tag in _linux.go is redundant if you add the tag)

the runv supports linux containers on the mac, so it requires the linux specified User structure. so we have to change the file name to xxx-linux.go as a workaround, or put them in the linux/ directory without _linux suffix as a full solution: (string as user name + linux/) or ( #135, json.RawMessage as username + linux/)

@philips
Copy link
Contributor

philips commented Dec 23, 2015

I think we should probably just retire ourselves to the fact that the concept of user is so platform specific that trying to put it into the shared part of the spec is impossible.

More and more I don't think there is anything that can safely be shared between platforms besides environment variables.

@wking
Copy link
Contributor

wking commented Dec 23, 2015

On Tue, Dec 22, 2015 at 11:09:06PM -0800, Brandon Philips wrote:

I think we should probably just retire ourselves to the fact that
the concept of user is so platform specific that trying to put it
into the shared part of the spec is impossible.

I don't see a platform-compatibility problem with the schema outlined
in 1. Platforms that don't support UIDs can error out if ‘uid’ is
set, and we can add platform-specific lookup logic. E.g. for
‘username’ we can require /etc/passwd for Linux lookups, some registry
(or whatever they use) for Windows lookups, etc. Although we'd
probably want separate fields for “host platform” and “container
platform” so the hypervisor folks can figure out what rules to apply.

More and more I don't think there is anything that can safely be
shared between platforms besides environment variables.

That is a pretty pessimistic view, and argues in favor of just
splitting the spec into opencontainers/specs-linux,
opencontainers/specs-windows, … (this is what I did with the
Linux-only ccon 2). Cross-platform specs may be a bit more
complicated (see all the platform caveats here 3), but I think it's
a worthy goal for areas where there is sufficient overlap to avoid it
being too complicated ;).

@philips
Copy link
Contributor

philips commented Dec 23, 2015

On Wed, Dec 23, 2015 at 9:57 AM W. Trevor King [email protected]
wrote:

On Tue, Dec 22, 2015 at 11:09:06PM -0800, Brandon Philips wrote:

I think we should probably just retire ourselves to the fact that
the concept of user is so platform specific that trying to put it
into the shared part of the spec is impossible.

I don't see a platform-compatibility problem with the schema outlined
in [1]. Platforms that don't support UIDs can error out if ‘uid’ is
set, and we can add platform-specific lookup logic. E.g. for
‘username’ we can require /etc/passwd for Linux lookups, some registry
(or whatever they use) for Windows lookups, etc. Although we'd
probably want separate fields for “host platform” and “container
platform” so the hypervisor folks can figure out what rules to apply.

With such a complex set of rules on each platform we might as well just
make different schemas. What is the point of a shared schema with different
semantics based on context?

And FWIW, I am not saying we should give up on shared language and shared
spec. Just that we are deriving very little value from having a shared
schema.

@wking
Copy link
Contributor

wking commented Dec 23, 2015

On Wed, Dec 23, 2015 at 11:17:10AM -0800, Brandon Philips wrote:

On Wed, Dec 23, 2015 at 9:57 AM W. Trevor King wrote:

On Tue, Dec 22, 2015 at 11:09:06PM -0800, Brandon Philips wrote:

I think we should probably just retire ourselves to the fact
that the concept of user is so platform specific that trying to
put it into the shared part of the spec is impossible.

I don't see a platform-compatibility problem with the schema
outlined in [1]. Platforms that don't support UIDs can error out
if ‘uid’ is set, and we can add platform-specific lookup
logic. E.g. for ‘username’ we can require /etc/passwd for Linux
lookups, some registry (or whatever they use) for Windows lookups,
etc. Although we'd probably want separate fields for “host
platform” and “container platform” so the hypervisor folks can
figure out what rules to apply.

With such a complex set of rules on each platform we might as well
just make different schemas. What is the point of a shared schema
with different semantics based on context?

Because it lets you agree where you can. For example, all of our
platforms support the same semantics for fields backed by Go's Cmd. I
don't see a need to duplicate all of that in per-platform schemas
because the platforms divergence on user modeling.

And FWIW, I am not saying we should give up on shared language and
shared spec. Just that we are deriving very little value from having
a shared schema.

Currently the spec doesn't have much normative content outside the
config schema. There's the state schema (which also has
cross-platform issues, e.g. PIDs are non-sensical for hypervisors),
some generic lifecycle wording, and… what? So I'm not sure what value
there is to “shared language” in the absence of at least some schema
overlap.

@philips
Copy link
Contributor

philips commented Dec 23, 2015

On Wed, Dec 23, 2015 at 11:44 AM W. Trevor King [email protected]
wrote:

On Wed, Dec 23, 2015 at 11:17:10AM -0800, Brandon Philips wrote:

On Wed, Dec 23, 2015 at 9:57 AM W. Trevor King wrote:

On Tue, Dec 22, 2015 at 11:09:06PM -0800, Brandon Philips wrote:

I think we should probably just retire ourselves to the fact
that the concept of user is so platform specific that trying to
put it into the shared part of the spec is impossible.

I don't see a platform-compatibility problem with the schema
outlined in [1]. Platforms that don't support UIDs can error out
if ‘uid’ is set, and we can add platform-specific lookup
logic. E.g. for ‘username’ we can require /etc/passwd for Linux
lookups, some registry (or whatever they use) for Windows lookups,
etc. Although we'd probably want separate fields for “host
platform” and “container platform” so the hypervisor folks can
figure out what rules to apply.

With such a complex set of rules on each platform we might as well
just make different schemas. What is the point of a shared schema
with different semantics based on context?

Because it lets you agree where you can. For example, all of our
platforms support the same semantics for fields backed by Go's Cmd. I
don't see a need to duplicate all of that in per-platform schemas
because the platforms divergence on user modeling.

Sure, lets all use the same fields for the Cmd stuff and share that.

But, for User it feels we should just make it explicit and pull it into the
Linux, Windows, etc schemas.

@wking
Copy link
Contributor

wking commented Dec 23, 2015

On Wed, Dec 23, 2015 at 11:49:35AM -0800, Brandon Philips wrote:

Sure, lets all use the same fields for the Cmd stuff and share that.

But, for User it feels we should just make it explicit and pull it
into the Linux, Windows, etc schemas.

I have no problem with that. And once we actually have something
for non-Linux user schemas we can revisit whether the split makes
sense. I'm also fine just rolling ahead with a Linux-centric user
schema, and revisiting whether or not we need a split after a
non-Linux OS speaks up with “this won't work for us, we need
$ALTERNATIVE”.

I don't see the need to namespace this (like we do now with
linux.namespaces, etc.), because the platform entry in config.json is
sufficient to specify which per-platform user schema we're using (and
I'd recommend dropping our existing ‘linux’ namespacing for the same
reasons). But perhaps that's too awkward to handle in Go, and we're
too committed to single-pass-into-Go-types JSON-parsing approach.

@philips
Copy link
Contributor

philips commented Dec 23, 2015

On Wed, Dec 23, 2015 at 11:57 AM W. Trevor King [email protected]
wrote:

On Wed, Dec 23, 2015 at 11:49:35AM -0800, Brandon Philips wrote:

Sure, lets all use the same fields for the Cmd stuff and share that.

But, for User it feels we should just make it explicit and pull it
into the Linux, Windows, etc schemas.

I have no problem with that. And once we actually have something
for non-Linux user schemas we can revisit whether the split makes
sense. I'm also fine just rolling ahead with a Linux-centric user
schema, and revisiting whether or not we need a split after a
non-Linux OS speaks up with “this won't work for us, we need
$ALTERNATIVE”.

Ack.

I don't see the need to namespace this (like we do now with

linux.namespaces, etc.), because the platform entry in config.json is
sufficient to specify which per-platform user schema we're using (and
I'd recommend dropping our existing ‘linux’ namespacing for the same
reasons). But perhaps that's too awkward to handle in Go, and we're
too committed to single-pass-into-Go-types JSON-parsing approach

I don't think we are committed but there just isn't great tooling for an
alternative. We should keep that discussion on
#185 though.

@crosbymichael
Copy link
Member

We are currently working on fixing this for cross platform concerns but as far as linux goes the passwd lookup should be done before config generation. There are many different problems defering this lookup to the runtime and it also removes the ability of using things like nss to looking the uid/gid for the system.

We reviewed this together in the f2f oci meeting and are going to keep uid/gid as required for linux and consumers should do lookups with either nss or etc/passwd before populating the config for the container.

wking referenced this pull request in opencontainers/umoci Nov 10, 2016
Use libcontainer/user to parse the /etc/{passwd,group} files inside the
container rootfs to allow us to handle string versions of user:group
specifications.

This also allows us to fill the AdditionalGids fields.

Signed-off-by: Aleksa Sarai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants