Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to create an empty scratch layer? #853

Closed
TBBle opened this issue Jul 23, 2020 · 18 comments
Closed

Is it possible to create an empty scratch layer? #853

TBBle opened this issue Jul 23, 2020 · 18 comments

Comments

@TBBle
Copy link
Contributor

TBBle commented Jul 23, 2020

It seems that hcsshim.CreateScratchLayer(.., someDir, "<unused>", []) simply creates the directory, but does not set up a sandbox.vhdx or similar. I tested this with a slightly-modified wclayer to make the parents list for wclayer create optional.

Is there any way to create a functional scratch layer without needing to base it on an existing image? This appears to be a blocker for #750, and the lack of it creates complications in containerd WCOW support since I cannot just treat every snapshot as a WCOW layer, but must juggle plain directories and WCOW layers with sandbox.vhdx and WCOW layers without sandbox.vhdx.

The ability to create an empty scratch layer would simplify this, by letting me use WCOW layers for all use-cases, and never needing to symlink anything, resolving an open question on containerd/containerd#2366

Also, it seems hcsshim.CreateLayer always seems to just create an empty directory no matter what I pass it. I assume that if it was doing anything behind-the-scenes, it would be safe to import layer data from an OCI tarball into the resulting directory.

I did a quick test, and hcsshim.CreateLayer even with a parents list does not seem to result in something suitable for the parents list of PrepareLayer, so I guess you must import a tarball into that directory?

@TBBle
Copy link
Contributor Author

TBBle commented Jul 25, 2020

So I've just noticed CreateNTFSVHD, which calls through to HcsFormatWritableLayerVhd, however it seems the current implementation requires Windows 19H1, according to #810.

So I guess this question can be treated as a feature-request to complete this support for ltsc2019 (assuming that's feasible... I'm testing on Windows 10 2004, so it's possible I'm already using too-new features) and either expose the functionality, or make it work for hcsshim.CreateScratchLayer(.., someDir, "<unused>", []), so that an empty scratch layer just happens naturally in the existing API.

@TBBle
Copy link
Contributor Author

TBBle commented Sep 12, 2020

I decided I'd have an explore myself. It looks like PrepareLayer always fails with "The parameter is incorrect. (0x57)" when called with no parent layers, whether the sandbox.vhdx in the layer directory was created with vmcompute.dll's CreateSandboxLayer (with parents), or computestorage.dll's HcsFormatWritableLayerVhd. The sandbox vhdx created by the latter does seem to happily mount like the former. I'm wondering if one is implemented by calling the other, and that CreateSandboxLayer really doesn't need or do anything with its parents list, apart from checking if it's empty.

I'm guessing this means there's no way to actually do what I want to do here with the current API. I'm not clear if that's something that can be fixed, or perhaps there is some other path I'm overlooking for creating an empty sandbox, mount it for filesystem operations (BuildKit's use-case for dockerfiles is this much), and export it as an layer, so that more layers can be build upon it, e.g. a Dockerfile like FROM scratch; COPY ...; COPY ...;, and then pushed as an image, e.g. #750's use-case.

@TBBle
Copy link
Contributor Author

TBBle commented Sep 12, 2020

Hmm. I just saw #407. That has some other likely-looking functions exposed from computestorage.dll. A lot of them look like things that already have implementations in Go, so I suspect the same limitations will apply.

@TBBle
Copy link
Contributor Author

TBBle commented Nov 16, 2020

For tracking purposes, it seems that the k8s implementation of privileged containers on Windows may require or produce a "slim base image" which could be used as scratch for purposes of FROM scratch, and also as a root parent for "parentless" scratch layers.

Or even just filling in the missing details needed to create a parentless/empty scratch layer.

@TBBle
Copy link
Contributor Author

TBBle commented Nov 23, 2020

TODO: Look closely at #881 (replaced #407, but without a couple of legacywriteablelayer functions), the newly-exposed APIs there might help with this, although I suspect this might be LTSC 2019 (RS5) or newer, and per #556 (comment) LTSC 2016 (RS1) and even non-LTSC releases are still out there are potentially relevant.

@TBBle
Copy link
Contributor Author

TBBle commented Nov 29, 2020

I had a quick experiment replacing hcsshim.PrepareLayer with computestorage.AttachLayerStorageFilter in cmd/wclayer/mount.go, but when passing in no parent layers, this changes "The parameter is incorrect. (0x57)" I was getting with the former, into an exception from the relevant syscall:

Exception 0xc0000005 0x1 0xfffffffffffffff0 0x7ffe3c5fcb4c
PC=0x7ffe3c5fcb4c

(Being an exception, this bypasses the deferred computestorage.DetachLayerStorageFilter, and the layer is now attached but not prepared, until I reboot.)

I don't see any existing uses of computestorage.AttachLayerStorageFilter, so it's possible I'm using it wrong, or completely misunderstanding it as a replacement for hcsshim.PrepareLayer. It seems more likely that I actually want computestorage.SetupBaseOSLayer during the layer creation, but I couldn't work out how to actually use that successfully, as it also has no existing users that I could see in this tree.

@micahyoung
Copy link

Hey @TBBle. I stumbled across your comment and I'm not sure if it's helpful but we did manage to create a WCOW scratch-equivalent layer containing the few files that were referenced in hcsshim, moby, and containerd:

https://github.com/buildpacks/imgutil/blob/main/layer/windows_baselayer.go

The only complex file in there is BCD which is a registry hive-formatted file, which we generate and memoize using libhivex with go bindings:

https://github.com/buildpacks/imgutil/blob/main/tools/bcdhive_generator/bcdhive_hivex.go

Both moby and containerd (WCOW with process isolation) can pull it and create containers with it, but it can't be used to run anything (as far as I can tell). There's unfortunately no API clearly supporting this so we are using this experimentally for now.

Here's an image of containing solely one of these layers, if you want to play around:

index.docker.io/micahyoung/scratch-windows

@TBBle
Copy link
Contributor Author

TBBle commented Nov 29, 2020

Oh, that's brilliant timing. I had just worked out part of this in containerd/containerd#4419 (comment) using ProcessBaseLayer, but I hadn't picked up the need for BCD, and it's nice to see it confirmed.

Since I don't want to create containers from this, (it's mostly used for storage, e.g. BuildKit for build contexts, and #750 above) it seems in brief testing I can just create dummy files in Windows\System32\Config, as that's what copied to the Hives folder (as _...Base) by hcsshim.ProcessBaseLayer. That was enough to let me create a child scratch layer, mount it, edit files in it, export it to a tarball, and then import that tarball as a new layer with the same parent, which is the test I've been failing up until now.

And having looked at all that, I realise that internal/wclayer/baselayer.go is the implementation of the code that receives the tarball you're generating in your script, and it just extracts it in-place and calls ProcessBaseLayer and optionally ProcessUtilityVMImage.

Interestingly, the parent export failed, I'm not sure if that's the missing UtilityVM data or something else. It might be that wclayer export cannot export base layers using the HCS exportLayer API, but it can be done with a trivial walk of the relevant directories to produce the same tarstream stream you're generating in your code, except I only need to pull it off the disk, which is nice.

I suspect that could be added to hcsshim the way base-layer import is handled, just directly walking the three relevant directories. That would also be easy to test in hcsshim, as it should be able to round-trip a WCOW parent layer, and get the exact same thing back the second time.

Edit: Missing BCD did not affect me because it looks like the UtilityVM is only used during the stream-from-tar-to-backup-and-write-to-disk stage, which I don't have when creating a new base layer from on-disk files. It might matter once I am able to export and reimport such layers.

Edit for the above: After implementing #901, I realise that the reason I wasn't noticing lack of BCD was that I wasn't calling ProcessUtilityVMImage in my experiments, which happens when importing a layer tarball that contains UtilityVM/Files. If that latter directory does not exist, then hcsshim's base layer import does not call ProcessUtilityVMImage, and you don't have a Utility VM image. At this stage, I'm not sure when a Utility VM is needed, so I haven't tried to enforce its presence.

@TBBle
Copy link
Contributor Author

TBBle commented Feb 22, 2021

A quick status update: #901 is ready for final review, I think, which will give us base-layer creation for "scratch" (i.e. not bootable!) WCOW layers on RS5 (Windows Server 2019).

However, Windows 20H2 (and presumably back to some earlier version after RS5) will refuse to import layers that depend on so-created base layers. That will be fixed in a follow-up PR based on @micahyoung's #853 (comment); minimal testing has been done on a hack version of this fix, see #901 (comment).

@riverar
Copy link

riverar commented Jun 10, 2021

@TBBle When you say "not bootable", what do you mean here? I'd expect a scratch image to be usable in Windows Server Container (Argon) scenarios. Is that not the case here?

@TBBle
Copy link
Contributor Author

TBBle commented Jun 11, 2021

I mean "not bootable" in that it won't contain an OS so you can't use it as-is with, e.g., docker run. Well, unless you actually add those to the layer yourself, of course. I suspect you'll have to provide your own UtilityVM for most booting scenarios, but have never tried to verify this.

License restrictions means it won't be practical for anyone to provide their own OS from Windows. I guess in-theory, one could provider their own-developed OS layer that can talk to the WCOW components, but I honestly have no idea how complex that would be -- I assume "very" -- and it's probably not documented anywhere public.

I'd have to check, it might also be against MS licensing to do that, but I'm not sure if they have anything that restricts what can be inside a WCOW container...

So the use-cases are for data distribution, e.g. #750, and creating layer-based snapshots for attaching to containers, i.e. the way BuildKit uses Containerd's snapshotter to capture and manage the Dockerfile and image build context.

@roozbehid
Copy link

@TBBle I have a feeling this case is not possible, having a image from scratch and then your application but using docker run to run it as in-process.
If that is the case then why? Isn't containers designed to use host kernel? Why there is a need for more files in the layer?

@TBBle
Copy link
Contributor Author

TBBle commented Oct 7, 2021

See the last paragraph of my previous comment for use-cases for non-bootable container images.

It's definitely possible to create a container image "from scratch" and then add files to make it bootable as a WCOW container. That's actually how all containers work (there's no difference in a "base" layer structurally), it's just that the code to do so for Windows Server containers is not public, and it's probably not allowed for anyone except MS to do it under the Windows EULA.

So while possible, I'd say it's not usefully feasible, and hence why I'm keeping it out-of-scope for this work.

@roozbehid
Copy link

@TBBle Thanks for your detailed answer. I am just reading your comments everywhere seems they are the most documentation I can get on everything.

Obviously a stupid question, but is it possible for example in Windows Server 2019 Host, have a minimal non-bootable image (with some of those magic, or UtilityVM) but with mapping of container's C:\Windows\System32 to host? Why that route was not taken to minimize size of Windows Server images at least on Windows Server hosts?

@TBBle
Copy link
Contributor Author

TBBle commented Oct 10, 2021

Generally, container images should be as self-contained as possible, and not depend on the userspace of the host. It would be similarly unexpected if a Linux base layer worked by mapping /usr from the host OS.

There's not actually a way in the OCI standard for container images for a layer to rely on host files, only the end-user can do that by mapping the host filesystem into a container at run-time.

@gerneio
Copy link

gerneio commented Aug 28, 2022

@TBBle any progress on this by chance? Finding the need to have a FROM scratch or similar within a WCOW exactly as you've detailed above (small packaged file system distribution w/o the bloat of a full Windows OS and whatever binding licensing agreement that would involve).

@micahyoung tried using your index.docker.io/micahyoung/scratch-windows image, but just building against it would cause an error, even if I had other valid windows image layers specified in the dockerfile (tested before and after the scratch layer):

PS C:\TestEmpty> docker build -t test-empty-image -f testempty.Dockerfile .

Sending build context to Docker daemon  520.9MB
Step 1/5 : FROM micahyoung/scratch-windows as build
 ---> 3c57ba00bc31
Step 2/5 : COPY content/ .
re-exec error: exit status 1: output: hcsshim::ImportLayer - failed failed in Win32: 
The configuration registry database is corrupt. (0x3f1)

testempty.Dockerfile:

FROM micahyoung/scratch-windows as build
COPY content/ .

FROM mcr.microsoft.com/windows/servercore:ltsc2019

WORKDIR /temp

COPY --from=build . .

@TBBle
Copy link
Contributor Author

TBBle commented Aug 29, 2022

The error you're seeing from index.docker.io/micahyoung/scratch-windows is probably because of a change in Windows sometime around 2020. I initially saw it on a Windows 10 20H1 machine (or so... may have been 19H2) but have recently noticed the same behaviour appeared in Windows Server 2019, so presumably it was backported for some reason or other. It's the same thing #853 (comment) was talking about for the BCD, this now also needs to be done for at least one of the registry hives, so my change was to do it for all of them. See #901 (comment).

For my own status, #901 is pending some rework from me, mostly unit tests and reimplementing the "hives" feature due to silent changes in Go module tooling (and also just to be less-awful); however, I'm in the middle of moving countries so I don't expect to have any time to put into this until late October at best.

After #901 is done, then the containerd side of this work can be completed, and then we can finish teaching BuildKit about native WCOW container builds. And then we can teach BuildKit to understand FROM scratch on Windows in that context.

Right now, you could use my branch's wclayer binary (you'd have to build to yourself and my last CI run on my WIP branch failed for unrelated reasons) and generate the layer tarball you'd need for FROM scratch, and then finagle that into a container image, to produce the equivalent of index.docker.io/micahyoung/scratch-windows; although I've never tested this flow, and I'm not sure if there are any good tools for doing the layer->container conversion or if you just need to write some JSON yourself and push it to a registry.

Perhaps docker has a command-line for importing a single layer as a container image?

That said, you might find that BuildKit's cross-build support can do FROM scratch targeting Windows from the Linux BulidKit daemon. I suspect the resulting layer will fail the same way as index.docker.io/micahyoung/scratch-windows is failing when you import it on Windows, if this support in BuildKit (which I recall just manually adds in the empty Files/ directory and then has magic to work with that subdirectory in subsequent layers) hasn't been updated for the registry issue.

@TBBle
Copy link
Contributor Author

TBBle commented Mar 2, 2023

#1637 (successor for #901 mentioned earlier) has merged, so the hcsshim-side of this feature is complete, and we can now create Base layers (i.e. with no parents) from arbitrary files (including no files, although you end up with registry hives no matter what due to image format requirements) and use them as the base for further layers.

To actually use this functionality outside the wclayer tool, there's still work to be landed in containerd and BuildKit. But in theory wclayer makebaselayer plus some other tooling could be used to fulfill #750, for example.

@TBBle TBBle closed this as completed Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants