-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification about symbolic and hardlink rules #857
Comments
Yeah some (non-normative) text on this wouldn't be a bad idea, to answer your questions in short:
Also just I just wanted to point out that the section you point to is actually about how to generate hardlink entries in a layer, not about how to deal with relative links when extracting layers. |
Ah you're right! is there a section about dealing with them (or is it entirely absent?) I will follow up with more comments tomorrow - off to bed! |
No I think it's entirely absent. The closest you get is that there's an implication in https://github.com/opencontainers/image-spec/blob/master/layer.md#populate-initial-filesystem that you scope all paths to the root of the container filesystem, but this is not explicitly stated. (I've always felt the tar stuff is the least-defined part of the specification, because I think at the time folks didn't want to specify too strongly what a tar archive actually is and how it should be generated so that we didn't end up encoding language or library specific behaviour into the spec.) I think that entire document should probably be rewritten to be much clearer (which I can think we can do without break backwards compatibility since the expected behaviour itself won't change). |
If it helps, I could enumerate all the link stuff Charliecloud does when unpacking images. E.g. I know we disallow symlinks that climb outside the image, and I know we had to think hard about when to follow and when not to, though I don't recall the details. |
@reidpr that would be useful I think - if you want to put that here, then I can take a first shot at a suggested update to the current spec - I'll start in a Google Doc and then can open a PR here for more fine tuned discussion. If you like I can also make the Google Doc first and then you can add notes to it. Let me know which is easiest for you! |
FWIW the reasoning here is:
|
I'd be happy to hammer out a document with you to better explain how the tar archives are meant to be handled.
So does umoci.
Well, absolute symlinks are something you definitely need to handle somehow because every container image uses them.
I'm not sure it's fair to say that they don't make sense -- Unix's behaviour when dealing with |
here is a document to get us started - I tried to be as neutral as possible with respect to the conversation here! https://docs.google.com/document/d/1uNigsONIndSUbzhtU-2H7zvsgzwrryV3RfJtI_aNxyQ/edit?usp=sharing |
Well, “well-defined” and “makes sense” are different concepts 😉, though the POSIX solution does seem a reasonable workaround; and thanks for the pointer (it's documented in path_resolution(7), which I was previously unaware of). And interestingly, looks like we do not validate this for symlinks; I must have been remembering something else.
💯 It helps that Charliecloud is fully unprivileged, so a given user can only mess themselves up. |
@reidpr I hope you might get some time to look at the document posted above your last comment too? |
Regarding link validation in Charliecloud, this is what I found. To be clear we have not been pursuing OCI compliance, though it would be a nice bonus if we can get it.
If anyone wants further details, they could search the Charliecloud repo for “link” and I’d be happy to answer questions. |
Yes, I'll look now. |
I should mention I'm by no means an expert here — I just know symlinks have a number of subtleties and have frequently led to security problems, and the Charliecloud team has fixed a lot of symlink-related bugs. |
hey @cyphar just checking in - is there something I can help with? |
Thanks for continually pinging me, I really am quite awful at multi-tasking it seems! I sat down and split out the sections and wrote what I feel is a more accurate (and specific) set of requirements for implementations. The doc I wrote does have two new MUSTs but since it's a clarification of existing requirements, I think it's okay to include in a minor release. |
(I'll copy this from the doc for a bit more exposure.) I just realised the hard-link behaviour is undefined if you modify a file that has hardlinks in a subsequent layer. I believe umoci (and any manifest-based image tool) will just create new hardlink entries for every hardlink if the contents change (because the contents changed all of the hardlinks end up differing from the manifest, triggering hardlink dedup during generation) -- though I’m not sure if overlayfs will do that. In any case, if there aren’t hardlink entries in the top layer, the implementation currently is free to either keep the hardlink intact (meaning that it doesn’t delete the target inode and simply O_WRONLY|O_TRUNC the file -- which is not necessarily safe to do in all cases) or to replace it, invalidating the other hardlinks. I'll need to look into what actually happens in the wild and consider what is the reasonable thing to do in that situation (though I expect that what umoci is currently doing is to replace the inode, which I would consider incorrect). |
@reidpr it looks like Charliecloud will allow hard links given that they are pointing inside the top level (does that mean the top layer)? https://github.com/hpc/charliecloud/blob/8f250367a98e67a472739e9c2dc12d9cd1a91827/lib/charliecloud.py#L573. What are your thoughts on @cyphar comment above? If it comes down to the last time the file is modified, could we say there is some best practice to do hard links at the end? But then I assume that also applies to multistage builds, and perhaps a container creator can't predict when their container would be used for that (and if a file is modified in a later layer that should be a hard link it would need to be done by the new recipe?) |
“Top level” here means the container filesystem root. I think Charliecloud meets @cyphar's described behavior in the doc.
I'm not precisely sure what Charliecloud's behavior here is, since we delegate to Python's Note Charliecloud is not in a container when it's unpacking a layer tarball, it's running unprivileged, and there is no overlayfs. So I would be sad if implementing the spec required any of these three things or was much more difficult without them.
Charliecloud mostly treats each stage in a multi-stage Dockerfile as an independent build. |
I would expect it should -- if it didn't you wouldn't be able to correctly extract a valid container image that contained hardlinks.
The reason I mentioned overlayfs is that that (or something like it) is how Docker/containerd generate their layers, so I was wondering out loud whether this will act differently (specifically in the hardlink case I mentioned above) to tools like umoci which operate on layers without using overlayfs -- it's possible that right now the two tools will have different results which is a little worrying. I have no intention of requiring anything like overlayfs to generate or operate on layers. |
I'm confused, are we referring to creating or extracting the layer tar. For creating, hardlink is just another ref to the same inode isn't it? How would any of the build tooling know if an inode pointed to a file outside of the container root? I suspect the tar tooling that creates hard links is just looking for the inode to be reused. For extracting, then I agree, we don't want to create hard links that point to files outside of the image root. |
@sudo-bmitch I was referring to extraction, of course you're right there's no real way of checking if a hardlinked file exists outside of the root (and if we disallowed this wholesale it would also disallow certain file deduplication storage optimisations I've been mulling over). And when creating an archive, the only way to find hardlinks is to compare all the files you've found (obviously you know how many there are using nlinks but that doesn't help you find them) -- so you wouldn't be able to find hardlinks outside of the host. |
Hi OCI! We are putting together some topics for clarification about the specs, and want to start sharing them slowly for discussion and possibly updating spec docs to have more clarification. For this point:
https://supercontainers.github.io/containers-wg/ideas/links_B6/
We are suggesting that the spec have more clarification with respect to how to deal with different kinds of links, as what to do in different situations is under-specified. For example:
One interesting quirk is that symlinks need to be interpreted relative to the image’s root, which makes non-containerized code trickier. For reference, it looks like the relative section is here: https://github.com/opencontainers/image-spec/blob/master/layer.md#hardlinks. Could we talk about a plan to clarify some of these points? Thank you!
cc @reidpr
The text was updated successfully, but these errors were encountered: