Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification about symbolic and hardlink rules #857

Open
vsoch opened this issue Jun 22, 2021 · 22 comments
Open

Clarification about symbolic and hardlink rules #857

vsoch opened this issue Jun 22, 2021 · 22 comments

Comments

@vsoch
Copy link
Contributor

vsoch commented Jun 22, 2021

Hi OCI! We are putting together some topics for clarification about the specs, and want to start sharing them slowly for discussion and possibly updating spec docs to have more clarification. For this point:

https://supercontainers.github.io/containers-wg/ideas/links_B6/

We are suggesting that the spec have more clarification with respect to how to deal with different kinds of links, as what to do in different situations is under-specified. For example:

  1. When should symlinks be followed and when should they not?
  2. Behavior of link replacement in a later layer is explicitly undefined.
  3. Is it permitted to have a link that climbs outside the image? (E.g.: ln -s ../foo /bar). What about other pathological link targets?

One interesting quirk is that symlinks need to be interpreted relative to the image’s root, which makes non-containerized code trickier. For reference, it looks like the relative section is here: https://github.com/opencontainers/image-spec/blob/master/layer.md#hardlinks. Could we talk about a plan to clarify some of these points? Thank you!

cc @reidpr

@cyphar
Copy link
Member

cyphar commented Jun 23, 2021

Yeah some (non-normative) text on this wouldn't be a bad idea, to answer your questions in short:

  1. I think most implementations don't follow symlinks (either existing ones from previous layers, or ones added in the current layer) -- which is the most reasonable policy since most tar generating utilities won't crate such layers by accident. However in umoci we do support this with --keep-dirlinks (a-la rsync) since it is a useful feature in a few specific scenarios.
  2. It's clobbered. Any entry in a new layer implicitly means that the entry at the same path in the old layer should be replaced (since most diff generating utilities don't explicitly have a "deleted" entry for files that change type). (EDIT: https://github.com/opencontainers/image-spec/blob/master/layer.md#changeset-over-existing-files explains this in a round-about way.)
  3. For hardlinks, no (and not rejecting or "sanitising" such links is a security bug -- in umoci we scope them to /). For symlinks, yes -- because symlink contents are just arbitrary text and creating one doesn't do anything for the container guest (they could've made the symlink themselves when running).

Also just I just wanted to point out that the section you point to is actually about how to generate hardlink entries in a layer, not about how to deal with relative links when extracting layers.

@vsoch
Copy link
Contributor Author

vsoch commented Jun 23, 2021

Also just I just wanted to point out that the section you point to is actually about how to generate hardlink entries in a layer, not about how to deal with relative links when extracting layers.

Ah you're right! is there a section about dealing with them (or is it entirely absent?) I will follow up with more comments tomorrow - off to bed!

@cyphar
Copy link
Member

cyphar commented Jun 23, 2021

No I think it's entirely absent. The closest you get is that there's an implication in https://github.com/opencontainers/image-spec/blob/master/layer.md#populate-initial-filesystem that you scope all paths to the root of the container filesystem, but this is not explicitly stated. (I've always felt the tar stuff is the least-defined part of the specification, because I think at the time folks didn't want to specify too strongly what a tar archive actually is and how it should be generated so that we didn't end up encoding language or library specific behaviour into the spec.)

I think that entire document should probably be rewritten to be much clearer (which I can think we can do without break backwards compatibility since the expected behaviour itself won't change).

@reidpr
Copy link

reidpr commented Jun 23, 2021

If it helps, I could enumerate all the link stuff Charliecloud does when unpacking images. E.g. I know we disallow symlinks that climb outside the image, and I know we had to think hard about when to follow and when not to, though I don't recall the details.

@vsoch
Copy link
Contributor Author

vsoch commented Jun 23, 2021

@reidpr that would be useful I think - if you want to put that here, then I can take a first shot at a suggested update to the current spec - I'll start in a Google Doc and then can open a PR here for more fine tuned discussion. If you like I can also make the Google Doc first and then you can add notes to it. Let me know which is easiest for you!

@reidpr
Copy link

reidpr commented Jun 24, 2021

disallow [relative] symlinks that climb outside the image

FWIW the reasoning here is:

  1. Charliecloud stores images as unpacked directories, and symlinks shouldn't point outside the image even when not in a mount namespace (though I don't know what do to about absolute symlinks).
  2. Such symlinks don't make sense, and our principle is "don't accept unreasonable input because the results may be surprising", so they are disallowed.

@cyphar
Copy link
Member

cyphar commented Jun 24, 2021

I'd be happy to hammer out a document with you to better explain how the tar archives are meant to be handled.

Charliecloud stores images as unpacked directories

So does umoci.

and symlinks shouldn't point outside the image even when not in a mount namespace (though I don't know what do to about absolute symlinks).

Well, absolute symlinks are something you definitely need to handle somehow because every container image uses them.

Such symlinks don't make sense, and our principle is "don't accept unreasonable input because the results may be surprising", so they are disallowed.

I'm not sure it's fair to say that they don't make sense -- Unix's behaviour when dealing with ../../../ symlinks in / is very well-defined. The other thing to consider is that even if you block an image from having such symlinks, an unprivileged user can always create a symlink with arbitrary contents (assuming your users have write access to to the filesystem), so your image processing tool will need to handle such symlinks existing anyway.

@vsoch
Copy link
Contributor Author

vsoch commented Jun 24, 2021

here is a document to get us started - I tried to be as neutral as possible with respect to the conversation here! https://docs.google.com/document/d/1uNigsONIndSUbzhtU-2H7zvsgzwrryV3RfJtI_aNxyQ/edit?usp=sharing

@reidpr
Copy link

reidpr commented Jun 29, 2021

I'm not sure it's fair to say that they don't make sense -- Unix's behaviour when dealing with ../../../ symlinks in / is very well-defined.

Well, “well-defined” and “makes sense” are different concepts 😉, though the POSIX solution does seem a reasonable workaround; and thanks for the pointer (it's documented in path_resolution(7), which I was previously unaware of).

And interestingly, looks like we do not validate this for symlinks; I must have been remembering something else.

The other thing to consider is that even if you block an image from having such symlinks, an unprivileged user can always create a symlink with arbitrary contents (assuming your users have write access to to the filesystem), so your image processing tool will need to handle such symlinks existing anyway.

💯 It helps that Charliecloud is fully unprivileged, so a given user can only mess themselves up.

@vsoch
Copy link
Contributor Author

vsoch commented Jun 29, 2021

@reidpr I hope you might get some time to look at the document posted above your last comment too?

@reidpr
Copy link

reidpr commented Jun 29, 2021

Regarding link validation in Charliecloud, this is what I found. To be clear we have not been pursuing OCI compliance, though it would be a nice bonus if we can get it.

  1. Layer tarball member validation rules:
    • accept directories and files
    • accept symlinks regardless of target
    • accept hard links, except reject (i.e., error out on encountering) those with absolute targets or that point outside the image (e.g., with “..”)
    • ignore devices and FIFOs
    • reject other member types
    • ensure directories have permissions at least 0700 and files 0600
    • reject members with absolute paths, up-level components (“..”), or that are outside the image
  2. COPY instruction was tough. Documentation on Charliecloud's behavior. This has hundreds of lines of testing to match Docker's behavior.
    • We do reject symlinks in the source that point outside the context directory, because Charliecloud provides the context directory directly, without tarring it up.
  3. Layers replacing {symlink, regular file, directory} with {symlink, regular file, directory} required extensive testing (input, output). I did not dig into the details on this one.

If anyone wants further details, they could search the Charliecloud repo for “link” and I’d be happy to answer questions.

@reidpr
Copy link

reidpr commented Jun 29, 2021

look at the document posted above

Yes, I'll look now.

@reidpr
Copy link

reidpr commented Jun 30, 2021

I should mention I'm by no means an expert here — I just know symlinks have a number of subtleties and have frequently led to security problems, and the Charliecloud team has fixed a lot of symlink-related bugs.

@vsoch
Copy link
Contributor Author

vsoch commented Jul 15, 2021

hey @cyphar just checking in - is there something I can help with?

@cyphar
Copy link
Member

cyphar commented Jul 16, 2021

Thanks for continually pinging me, I really am quite awful at multi-tasking it seems! I sat down and split out the sections and wrote what I feel is a more accurate (and specific) set of requirements for implementations. The doc I wrote does have two new MUSTs but since it's a clarification of existing requirements, I think it's okay to include in a minor release.

@cyphar
Copy link
Member

cyphar commented Jul 16, 2021

(I'll copy this from the doc for a bit more exposure.)

I just realised the hard-link behaviour is undefined if you modify a file that has hardlinks in a subsequent layer. I believe umoci (and any manifest-based image tool) will just create new hardlink entries for every hardlink if the contents change (because the contents changed all of the hardlinks end up differing from the manifest, triggering hardlink dedup during generation) -- though I’m not sure if overlayfs will do that. In any case, if there aren’t hardlink entries in the top layer, the implementation currently is free to either keep the hardlink intact (meaning that it doesn’t delete the target inode and simply O_WRONLY|O_TRUNC the file -- which is not necessarily safe to do in all cases) or to replace it, invalidating the other hardlinks.

I'll need to look into what actually happens in the wild and consider what is the reasonable thing to do in that situation (though I expect that what umoci is currently doing is to replace the inode, which I would consider incorrect).

@vsoch
Copy link
Contributor Author

vsoch commented Jul 16, 2021

@reidpr it looks like Charliecloud will allow hard links given that they are pointing inside the top level (does that mean the top layer)? https://github.com/hpc/charliecloud/blob/8f250367a98e67a472739e9c2dc12d9cd1a91827/lib/charliecloud.py#L573. What are your thoughts on @cyphar comment above? If it comes down to the last time the file is modified, could we say there is some best practice to do hard links at the end? But then I assume that also applies to multistage builds, and perhaps a container creator can't predict when their container would be used for that (and if a file is modified in a later layer that should be a hard link it would need to be done by the new recipe?)

@reidpr
Copy link

reidpr commented Jul 16, 2021

it looks like Charliecloud will allow hard links given that they are pointing inside the top level (does that mean the top layer)?

“Top level” here means the container filesystem root. I think Charliecloud meets @cyphar's described behavior in the doc.

I just realised the hard-link behaviour is undefined if you modify a file that has hardlinks in a subsequent layer.

I'm not precisely sure what Charliecloud's behavior here is, since we delegate to Python's tarfile library. The only special treatment hard links get is when we examine the tarball contents.

Note Charliecloud is not in a container when it's unpacking a layer tarball, it's running unprivileged, and there is no overlayfs. So I would be sad if implementing the spec required any of these three things or was much more difficult without them.

multistage builds

Charliecloud mostly treats each stage in a multi-stage Dockerfile as an independent build.

@cyphar
Copy link
Member

cyphar commented Jul 17, 2021

“Top level” here means the container filesystem root. I think Charliecloud meets @cyphar's described behavior in the doc.

I would expect it should -- if it didn't you wouldn't be able to correctly extract a valid container image that contained hardlinks.

Note Charliecloud is not in a container when it's unpacking a layer tarball, it's running unprivileged, and there is no overlayfs. So I would be sad if implementing the spec required any of these three things or was much more difficult without them.

The reason I mentioned overlayfs is that that (or something like it) is how Docker/containerd generate their layers, so I was wondering out loud whether this will act differently (specifically in the hardlink case I mentioned above) to tools like umoci which operate on layers without using overlayfs -- it's possible that right now the two tools will have different results which is a little worrying. I have no intention of requiring anything like overlayfs to generate or operate on layers.

@sudo-bmitch
Copy link
Contributor

  1. Is it permitted to have a link that climbs outside the image? (E.g.: ln -s ../foo /bar). What about other pathological link targets?

For hardlinks, no (and not rejecting or "sanitising" such links is a security bug -- in umoci we scope them to /).

I'm confused, are we referring to creating or extracting the layer tar. For creating, hardlink is just another ref to the same inode isn't it? How would any of the build tooling know if an inode pointed to a file outside of the container root? I suspect the tar tooling that creates hard links is just looking for the inode to be reused. For extracting, then I agree, we don't want to create hard links that point to files outside of the image root.

@cyphar
Copy link
Member

cyphar commented Feb 25, 2022

@sudo-bmitch I was referring to extraction, of course you're right there's no real way of checking if a hardlinked file exists outside of the root (and if we disallowed this wholesale it would also disallow certain file deduplication storage optimisations I've been mulling over). And when creating an archive, the only way to find hardlinks is to compare all the files you've found (obviously you know how many there are using nlinks but that doesn't help you find them) -- so you wouldn't be able to find hardlinks outside of the host.

@vsoch
Copy link
Contributor Author

vsoch commented Feb 25, 2022

@cyphar @reidpr thanks for your feedback! Is there a nice succinct final sentence we could put to make all parties happy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants