-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Multi-stage builds #31067
Comments
Have you considered combining multiple e.g. # docker-build.yaml (just example)
build:
img1:
context: .
dockerfile: Dockerfile.img1
img2:
context:
image: img1
path: /src/build
dockerfile: Dockerfile.img2 It would be much more flexible (both good and bad) |
Would it be fair to say this is a more constrained version of a |
I believe something along these lines is what was meant by "it would be best to solve that problem in a higher level [tool]", so it's still being considered. |
This is very interesting. IMO several limitations related to multiple build steps (or branches) can be overcomed with a slight change to the original idea. The context should be "composable" as opposed to completely replacing it each time. Basically |
@tonistiigi Another benefit of the proposed approach and others that separate build and run time concerns is the elimination of the squash mechanism #22641. Instead, this approach would allow a developer to exactly define the layering desired for a resultant image. It also eliminates the anti-pattern code, manually written in a Dockerfile, to eliminate build time artifacts, that to me, is the sole purpose behind Once CONTEXT becomes composable , BUILD's other Dockerfile internal semantics are unnecessary, as the resultant image file system is represented within the build context and can be simply copied to create the image. Decoupling the current notion of a "build context" from its straitjacketed "physical" form to project it as a logical/virtual file system is not a new notion, as it's a core feature of the Unix file system (predecessor to the 'L' version) as implemented by the Unix If interested, in a more detailed explanation of what's suggested above:
|
I'm fine with any name if more people prefer it. Setting just context is a bit of a simplification though as it also resets all metadata.
In some ways yes, but they are quite different for users. I'd prefer to keep it as simple as possible at first. Copying back will mean that it can't be considered as a completely separate build action what makes it not suitable for higher level formats that have their own way for defining dependencies. It also requires some changes for the caching logic. An extra |
Not sure to understand what you mean here.
Yes, but you can say goodbye to the cache. This is an example of a typical multi-stage build (pseudo-syntax): FROM foo
COPY file1 ./
RUN command which generates artifact1 from file1
COPY file2 ./
RUN command which generates artifact2 from file2 and artifact1 Using the "limited" FROM foo
COPY file1 ./
RUN command which generates artifact1 from file1
COPY file2 ./
CONTEXT /dir/which/contains/both/artifact1/and/file2
FROM bar
RUN command which generates artifact2 from file2 and artifact1 Giving to FROM foo
COPY file1 ./
RUN command which generates artifact1 from file1
CONTEXT artifact1 /tmp
FROM bar
COPY tmp/artifact1 ./
COPY file2 ./
RUN command which generates artifact2 from file2 and artifact1 [EDIT] The syntax would be |
Have you considered an approach that doesn't require a new Dockerfile command? In particular, the
So, for example: Dockerfile:
Would do two "builds", One to generate the myapp exe and then one to copy it into an empty image. This is similar to what @philtay suggested (I think -just noticed it), but w/o requiring an explicit CONEXT command, and it helps the non-recursive build cases too. |
@duglin I actually don't like that since it's not exactly clear what's happening there. |
@cpuguy83 I'm not too worried about old parsers since multiple FROMs in Dockerfiles are pretty useless today and its been suggested we remove it. This idea would make multiple FROMs useful. |
e.g. accessing a secret during the build w/o worrying about copying it into a layer. |
Mounting to For the second example I don't get as if I'm not completely against changing the context to additive instead. But I would need to get more feedback from other maintainers about this being critical. I do think it simplifies some use cases for the user. With the current implementation, we could do further optimizations like directly copying data from the image without copying it to a temporary directory, potentially also saving the cost for cache hashing as well. Copying data from one image to another is a good lower level component that is easy to build upon for many complicated cases. With adding additive context we provide a solution for chained build in Dockerfile but do not really improve the overall problem of defining more complex builds(unless this turns into generic reusable cache folders later). Also, it is possible to update from switch-context to additive-context but not the other way around. Not to complicate things but one way to solve the problem you pointed without making context mutable would be:
The initial feedback I have got from some maintainers testing #31257 is that this case is easy to work around. The cache invalidation issue that you pointed out only appears if data isn't copied after the first artifact has been built. |
@tonistiigi The |
@tonistiigi sorry if I wasn't clear. Looking at the example I wrote:
The 2nd part of the build would have the image from the first part of the build mounted r/o at /.context. The pattern I'm trying to follow is that any build container will always have the input context r/o mounted at The COPY is needed to copy the build results ("myapp") from the first image into the 2nd one. W/o that the 2nd image is empty. Or did I not follow your question? |
btw, in your example:
I think that's the same thing I was proposing except instead of "$0" was using "/.context" - a minor diff IMO. |
@duglin Ah, ok. I didn't get that |
Oh I now see why my COPY was confusing, sorry, it should be this:
The advantage of using |
Maybe I'm wrong, but there is a difference. With |
if all we care about is COPY then yea that might work, but I think allowing RUN to see (at least) the previous context is useful - especially if you want to give access data that you don't want saved in the image. |
For example, let's ignore that ADD has some magic :-), I could see this:
|
Giving to Anyway, in theory, you could do that with the Your example would be:
And if I need to compile the tarball:
|
yea but the problem with putting $0 into RUN is that we then have to process it and look for $0, which we don't do today. Today just we just let the shell deal with env vars in RUN. And, then of course, that only limits you to specifying it on the RUN cmd itself. It would be nice if a bash file could access /.context from its logic/code. |
[offtopic] |
I have no idea if that's super cool or super weird ;-) |
I agree that having the context accessible without copying it would be nice, but it's unrelated to the nested build problem. Another solution could be to have several directories, all of them rooted in |
not sure why you keep saying its not related to the nested build issue, I think my example shows how I personally would use it to solve my nested build issue of wanting 1 build to build my exe and a 2nd build to put just that exe into a scratch image. |
It's because
In the first step IMO
As you can see you don't even need a multi-step build. The second |
Ah, I didn't realize you were so focused on wanting access to more than just the previous context. Then yea, we could do what you suggested in a previous comment and make them all available at some well known locations in the filesystem. I like your tweaking of my example :-) I think having the files available during RUN could open up lots of nice options for people. Although, in that case I'm not sure it would work since 'go' isn't available in 'scratch'. |
Yep, in a "true" multi-step build you can't lose access to the initial context or to the previous build steps. Otherwise you're basically forced to invalidate the cache.
Ok, make it |
I gathered more feedback from @dmcgowan @icecrime @dnephin @justincormack @simonferquel Everyone seem to be OK with the new proposal. Instead of Also, we also need to provide naming to the build blocks in addition to currently proposed incrementing numbers. Feel free to discuss syntaxes for that but it shouldn't block any work for getting the initial implementation merged. |
I propose the FROM alpine AS mycontext
RUN ...
FROM alpine AS anothercontext
COPY --context=mycontext /foo /bar/
RUN ...
FROM scratch
COPY --context=mycontext /foo /bar/
COPY --context=anothercontext /zoo /moo/
RUN ... EDIT Example: ...
...
FROM scratch
COPY --context=mycontext /foo /bar/
COPY --context=anothercontext /zoo /moo/
COPY /aaa /bbb/ # /aaa is in the initial context
RUN ... |
How do you currently copy from a mult-stage build? |
I agree with the AS syntax, but I think it should be on its own line like all the other commands. # build
#############################################
FROM ubuntu:latest
AS build
WORKDIR /build/
RUN apt-get update && apt-get install -y nodejs && \
npm install -g gulp
COPY ./package.json /build/package.json
RUN npm install
COPY ./ /build/
RUN gulp
# final
##############################################
FROM alpine:latest
WORKDIR /app/
RUN apk add --no-cache nginx
COPY --context=build /build/dist/* /app/
COPY ./nginx.conf /etc/nginx/conf.d/default.conf
RUN chown -R nobody:nobody /app/
ENTRYPOINT ["/usr/sbin/nginx", "-g", "daemon off;"] |
Here's my take on @AkihiroSuda's suggestion, a proof of concept https://github.com/Perlence/docker-multi-build |
For my /2c, I think I prefer the Main reason I don't like it being it's own line, is theoretically you could then make something like this (
|
Maybe it can be useful: in rocker (~1.5 years ago) for that purpose we create FROM google/golang:1.4
ADD . /src
WORKDIR /src
RUN CGO_ENABLED=0 go build -a -installsuffix cgo -v -o rocker.o rocker.go
EXPORT rocker.o #1
FROM busybox
IMPORT rocker.o /bin/rocker #2
CMD ["/bin/rocker"] Maybe @ybogdanov can add more info. |
Is there any chance to specify stage you want to build, ala make file targets? |
@jmarcos-cano |
@jmarcos-cano you can use |
#resurrects #7149
We've been going back-and-forth among some maintainers to provide a way to provide capabilities for users to produce sleek images without the cruft of the intermediate build artifacts.
We see a lot of requests from the community for this feature and different ways how people try to work around it, most commonly with
docker cp
and re-tarring a new context or trying to combine the whole build into a singleRUN
instruction.Among the things we discussed were rebasing to a different rootfs path, mounting or copying data from other images, using cache storage between images, squashing, subblocks inside dockerfile, invoking builder inside of dockerfile etc.
Eventually, we ended up on the #7149 proposal that allows switching context of a build to a directory from an existing image. The benefits of this proposal are that it least conflicts with the current design principles of Dockerfile like self-consistency, build cache, returning single target etc. while elegantly solving the small images problem
While this proposal can be considered as a "chained-build" and has some limitations for describing complicated build graphs with multiple branches we have concluded that it would be best to solve that problem in a more higher level and we continue to investigate possible improvements.
The proposal:
edit: this has been updated to new syntax
edit2:
s/--context/--from/
--from=n
flag allows to access files from rootfs of previous build block. Every build block starts with aFROM
instruction(multipleFROM
instructions already work in Docker today).n
specifies an incrementing index for every block. In the future we want to extend it to human readable labels.Benefits for this syntax are that when files from the user context are required both for building some artifact and also for the final image they don't need to be copied to the first environment. That also means that it doesn't invalidate cache for the first environment if the file is not used there. This syntax can also be used for including content from other images with just extra
FROM
command.old proposal:
The proposal:
BUILD /path/to/context
instruction in theDockerfile
that switches the current build context to/path/to/context
from the current image's rootfs.docker build docker://image-reference[::/subdir]
that invokes a new build using the data from a specified image as a build context.Notes:
BUILD
. The next instruction after this command needs to beFROM
.BUILD
instruction is asSETCONTEXT
BUILD
instruction end up in the final image.docker build -t
would tag the last image defined at the end the DockerfileExample:
@icecrime @vikstrous @fermayo
The text was updated successfully, but these errors were encountered: