Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Multi-stage builds #31067

Closed
tonistiigi opened this issue Feb 16, 2017 · 39 comments · Fixed by #31257
Closed

Proposal: Multi-stage builds #31067

tonistiigi opened this issue Feb 16, 2017 · 39 comments · Fixed by #31257
Labels
area/builder kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny

Comments

@tonistiigi
Copy link
Member

tonistiigi commented Feb 16, 2017

#resurrects #7149

We've been going back-and-forth among some maintainers to provide a way to provide capabilities for users to produce sleek images without the cruft of the intermediate build artifacts.

We see a lot of requests from the community for this feature and different ways how people try to work around it, most commonly with docker cp and re-tarring a new context or trying to combine the whole build into a single RUN instruction.

Among the things we discussed were rebasing to a different rootfs path, mounting or copying data from other images, using cache storage between images, squashing, subblocks inside dockerfile, invoking builder inside of dockerfile etc.

Eventually, we ended up on the #7149 proposal that allows switching context of a build to a directory from an existing image. The benefits of this proposal are that it least conflicts with the current design principles of Dockerfile like self-consistency, build cache, returning single target etc. while elegantly solving the small images problem

While this proposal can be considered as a "chained-build" and has some limitations for describing complicated build graphs with multiple branches we have concluded that it would be best to solve that problem in a more higher level and we continue to investigate possible improvements.

The proposal:

edit: this has been updated to new syntax
edit2: s/--context/--from/

--from=n flag allows to access files from rootfs of previous build block. Every build block starts with a FROM instruction(multiple FROM instructions already work in Docker today). n specifies an incrementing index for every block. In the future we want to extend it to human readable labels.

FROM ubuntu

RUN apt-get install build-essentials
ADD . /src
RUN cd /src && make

FROM busybox
COPY --from=0 app /usr/local/bin/app
EXPOSE 80
ENTRYPOINT /usr/local/bin/app

Benefits for this syntax are that when files from the user context are required both for building some artifact and also for the final image they don't need to be copied to the first environment. That also means that it doesn't invalidate cache for the first environment if the file is not used there. This syntax can also be used for including content from other images with just extra FROM command.

old proposal:

The proposal:

BUILD /path/to/context instruction in the Dockerfile that switches the current build context to /path/to/context from the current image's rootfs.

docker build docker://image-reference[::/subdir] that invokes a new build using the data from a specified image as a build context.

Notes:

  • No previous metadata carries over to the new image after BUILD. The next instruction after this command needs to be FROM.
  • The other way to think about the BUILD instruction is as SETCONTEXT
  • The build from docker reference syntax is useful when the build is described by multiple Dockerfiles and dependencies are controlled by Makefile like utility.
  • Only the layers after the last BUILD instruction end up in the final image.
  • docker build -t would tag the last image defined at the end the Dockerfile
  • Some features like auto-tagging and specifying/loading a Dockerfile from new context directory have been left out and can be considered as future additions.

Example:

FROM ubuntu

RUN apt-get install build-essentials
ADD . /src
RUN cd /src && make

BUILD /src/build
FROM busybox
COPY app /usr/local/bin/app
EXPOSE 80
ENTRYPOINT /usr/local/bin/app

@icecrime @vikstrous @fermayo

@AkihiroSuda
Copy link
Member

Have you considered combining multiple Dockerfiles using some external file ?

e.g.

# docker-build.yaml (just example)
build:
  img1:
    context: .
    dockerfile: Dockerfile.img1
  img2:
    context:
      image: img1
      path: /src/build
    dockerfile: Dockerfile.img2

It would be much more flexible (both good and bad)

@AkihiroSuda AkihiroSuda added the kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny label Feb 16, 2017
@dnephin
Copy link
Member

dnephin commented Feb 16, 2017

BUILD doesn't seem like an intuitive name. Since you used SETCONTEXT as an example to explain the concept, maybe CONTEXT is a better name for the directive?

Would it be fair to say this is a more constrained version of a BEGIN/COMMIT set of directives?

@dnephin
Copy link
Member

dnephin commented Feb 16, 2017

Have you considered combining multiple Dockerfiles using some external file ?

I believe something along these lines is what was meant by "it would be best to solve that problem in a higher level [tool]", so it's still being considered.

@philtay
Copy link

philtay commented Feb 16, 2017

This is very interesting. IMO several limitations related to multiple build steps (or branches) can be overcomed with a slight change to the original idea. The context should be "composable" as opposed to completely replacing it each time. Basically BUILD should work like COPY but the other way around. It copies from the image to the context. The initial context is the content of the Dockerfile's parent directory. After some build steps there is a command like BUILD /foo /path/to/my/context. The content of /foo in the image is copied to /path/to/my/context. The initial context, along with the new directory, is the context of the next FROM instruction.

@WhisperingChaos
Copy link
Contributor

WhisperingChaos commented Feb 17, 2017

@tonistiigi
I support the concept of a "CONTEXT" introduced by this BUILD operator but would expand it with @philtay recommendation that it be "composable". In addition to being composable, I would also introduce a visibility mechanism, analogous to SQL "view", that allows a developer to define a logical file system view (a.k.a. interface) to a given FROM. This mechanism not only decouples the file system's shape in a multi-FROM (multi-step) Dockerfile, but also limits the data presented to a particular build step to what it needs to know. A visibility mechanism is sorely needed to properly incorporate secret management into the build process.

Another benefit of the proposed approach and others that separate build and run time concerns is the elimination of the squash mechanism #22641. Instead, this approach would allow a developer to exactly define the layering desired for a resultant image. It also eliminates the anti-pattern code, manually written in a Dockerfile, to eliminate build time artifacts, that to me, is the sole purpose behind --squash ( see 22641#issuecomment).

Once CONTEXT becomes composable , BUILD's other Dockerfile internal semantics are unnecessary, as the resultant image file system is represented within the build context and can be simply copied to create the image.

Decoupling the current notion of a "build context" from its straitjacketed "physical" form to project it as a logical/virtual file system is not a new notion, as it's a core feature of the Unix file system (predecessor to the 'L' version) as implemented by the Unix mount command. It enabled the composition of an expansive logical file system by cobbling together file systems from, for example, many smaller physical ones and allowed an administrator to define its shape and the visibility of the contributed components.

If interested, in a more detailed explanation of what's suggested above:

@tonistiigi
Copy link
Member Author

BUILD doesn't seem like an intuitive name. Since you used SETCONTEXT as an example to explain the concept, maybe CONTEXT is a better name for the directive?

I'm fine with any name if more people prefer it. Setting just context is a bit of a simplification though as it also resets all metadata.

Would it be fair to say this is a more constrained version of a BEGIN/COMMIT set of directives?

In some ways yes, but they are quite different for users. BEGIN/COMMIT provide users a way to clean up after themselves while this is meant for switching to a completely new context and bringing some artifacts from previous build.

@philtay

I'd prefer to keep it as simple as possible at first. Copying back will mean that it can't be considered as a completely separate build action what makes it not suitable for higher level formats that have their own way for defining dependencies. It also requires some changes for the caching logic. An extra ADD could always have the same behaviour that you are describing. @dnephin wdyt?

@philtay
Copy link

philtay commented Feb 22, 2017

@tonistiigi

Copying back will mean that it can't be considered as a completely separate build action what makes it not suitable for higher level formats that have their own way for defining dependencies.

Not sure to understand what you mean here.

An extra ADD could always have the same behaviour that you are describing.

Yes, but you can say goodbye to the cache.

This is an example of a typical multi-stage build (pseudo-syntax):

FROM foo
COPY file1 ./
RUN command which generates artifact1 from file1
COPY file2 ./
RUN command which generates artifact2 from file2 and artifact1

Using the "limited" CONTEXT (or BUILD) command:

FROM foo
COPY file1 ./
RUN command which generates artifact1 from file1
COPY file2 ./
CONTEXT /dir/which/contains/both/artifact1/and/file2
FROM bar
RUN command which generates artifact2 from file2 and artifact1

Giving to CONTEXT the ability to modify the context:

FROM foo
COPY file1 ./
RUN command which generates artifact1 from file1
CONTEXT artifact1 /tmp
FROM bar
COPY tmp/artifact1 ./
COPY file2 ./
RUN command which generates artifact2 from file2 and artifact1

[EDIT]
The point is that having to run all of the COPY commands during the first step invalidates the cache for the subsequent ones. Imagine a real multi-step build: you have to copy during the first step a file you'll need only during the fourth.
[/EDIT]

The syntax would be CONTEXT <src>... <dest>, with <dest> optional. If missing, CONTEXT replaces entirely (instead of modifying) the current context (i.e. this solution gracefully "degrades" to the original idea). I'm sure this is harder to implement, but it would solve the "nested build problem".

@duglin
Copy link
Contributor

duglin commented Mar 7, 2017

Have you considered an approach that doesn't require a new Dockerfile command? In particular, the FROM command could be used. Here's my current thinking...

  • if the build context were mounted into the build container then the entire build context would be available during RUN cmds w/o the need to do a COPY/ADD first. This avoids any unnecessary layers in the build process.
  • the build context could be available to the RUN cmds at a well defined, location, e.g. /.context. A little hidden so it doesn't interfere with stuff.
  • when a second FROM command is seen it will just mount the previously built image into the new build container at ./context as well.

So, for example:

Dockerfile:

FROM ubuntu
RUN go build -o /bin/myapp /.context/myapp/src/*.go
FROM scratch
COPY /.context/bin/myapp /
CMD /myapp

Would do two "builds", One to generate the myapp exe and then one to copy it into an empty image. This is similar to what @philtay suggested (I think -just noticed it), but w/o requiring an explicit CONEXT command, and it helps the non-recursive build cases too.

@cpuguy83
Copy link
Member

cpuguy83 commented Mar 8, 2017

@duglin I actually don't like that since it's not exactly clear what's happening there.
Also going to be confusing if you pass a new Dockerfile to an older parser... where as BUILD would just error out.

@duglin
Copy link
Contributor

duglin commented Mar 8, 2017

@cpuguy83 I'm not too worried about old parsers since multiple FROMs in Dockerfiles are pretty useless today and its been suggested we remove it. This idea would make multiple FROMs useful.
And the ability to access files from the build context w/o having to copy them into the container seems really useful to me - but perhaps its just me.

@duglin
Copy link
Contributor

duglin commented Mar 8, 2017

e.g. accessing a secret during the build w/o worrying about copying it into a layer.

@tonistiigi
Copy link
Member Author

@duglin

Mounting to ./context on RUN is a different use case. In principle, I'm not against it if it is read-only but it doesn't really relate to chained builds.

For the second example I don't get as if ./context is mounted with RUN why would you need to do ./context in COPY afterward. It is almost as you had mounted the context as root before. Making mounted context read-write means that it would need to be rescanned for cache after every RUN invocation.

@philtay

I'm not completely against changing the context to additive instead. But I would need to get more feedback from other maintainers about this being critical. I do think it simplifies some use cases for the user.

With the current implementation, we could do further optimizations like directly copying data from the image without copying it to a temporary directory, potentially also saving the cost for cache hashing as well.

Copying data from one image to another is a good lower level component that is easy to build upon for many complicated cases. With adding additive context we provide a solution for chained build in Dockerfile but do not really improve the overall problem of defining more complex builds(unless this turns into generic reusable cache folders later). Also, it is possible to update from switch-context to additive-context but not the other way around.

Not to complicate things but one way to solve the problem you pointed without making context mutable would be:

from debian
add src .
run make
from scratch
copy foo . # from initial context
copy $0/binary . # $0 means copy from rootfs of first build section

The initial feedback I have got from some maintainers testing #31257 is that this case is easy to work around. The cache invalidation issue that you pointed out only appears if data isn't copied after the first artifact has been built.

@philtay
Copy link

philtay commented Mar 14, 2017

@tonistiigi The $0 solution is a good one. In this way the initial context is always available and you gain access to previous images in the build chain. +1

@duglin
Copy link
Contributor

duglin commented Mar 14, 2017

@tonistiigi sorry if I wasn't clear. Looking at the example I wrote:

FROM ubuntu
RUN go build -o /bin/myapp /.context/myapp/src/*.go

FROM scratch
COPY /.context/bin/myapp /
CMD /myapp

The 2nd part of the build would have the image from the first part of the build mounted r/o at /.context. The pattern I'm trying to follow is that any build container will always have the input context r/o mounted at /.context - whether the input context is the user's build context or the previous build's filesystem in the case of a multi-build Dockerfile.

The COPY is needed to copy the build results ("myapp") from the first image into the 2nd one. W/o that the 2nd image is empty.

Or did I not follow your question?

@duglin
Copy link
Contributor

duglin commented Mar 14, 2017

btw, in your example:

from debian
add src .
run make
from scratch
copy foo . # from initial context
copy $0/binary . # $0 means copy from rootfs of first build section

I think that's the same thing I was proposing except instead of "$0" was using "/.context" - a minor diff IMO.

@tonistiigi
Copy link
Member Author

@duglin Ah, ok. I didn't get that .context switches between builds. Yes, that looks the same except the direct mounting to RUN that is a separate concern.

@duglin
Copy link
Contributor

duglin commented Mar 14, 2017

Oh I now see why my COPY was confusing, sorry, it should be this:

FROM ubuntu
RUN go build -o /bin/myapp /.context/myapp/src/*.go

FROM scratch
RUN cp /.context/bin/myapp /
CMD /myapp

The advantage of using /.context instead of $0 is that any RUN cmd can get to the data w/o special processing of $0

@philtay
Copy link

philtay commented Mar 14, 2017

Maybe I'm wrong, but there is a difference. With $n I can copy from any previous image (e.g. $0, $1, $2, ...), on the other hand with /.context I can copy only from the previous one. And what about the initial context? Does it get lost after the first build step?

@duglin
Copy link
Contributor

duglin commented Mar 14, 2017

if all we care about is COPY then yea that might work, but I think allowing RUN to see (at least) the previous context is useful - especially if you want to give access data that you don't want saved in the image.

@duglin
Copy link
Contributor

duglin commented Mar 14, 2017

For example, let's ignore that ADD has some magic :-), I could see this:

FROM ubuntu
RUN wget http://..../myapp.tgz

FROM ubuntu
RUN tar -xf /.context/myapp.tgz

@philtay
Copy link

philtay commented Mar 15, 2017

Giving to RUN the ability to see into the context is nice, but not directly related to the nested build problem.

Anyway, in theory, you could do that with the $n syntax as well. $0 is the initial context, $1 the first build step, etc.

Your example would be:

FROM ubuntu
RUN wget http://..../myapp.tgz

FROM ubuntu
RUN tar -xf /$0/myapp.tgz

And if I need to compile the tarball:

FROM ubuntu
RUN wget http://..../myapp.tgz

FROM ubuntu
RUN tar -xf /$0/myapp.tgz

FROM ubuntu
RUN make -C $1/myapp

@duglin
Copy link
Contributor

duglin commented Mar 15, 2017

yea but the problem with putting $0 into RUN is that we then have to process it and look for $0, which we don't do today. Today just we just let the shell deal with env vars in RUN. And, then of course, that only limits you to specifying it on the RUN cmd itself. It would be nice if a bash file could access /.context from its logic/code.

@tonistiigi
Copy link
Member Author

[offtopic] RUN --mount $0/foo:/bar tar -xf /bar/t.tgz [/offtopic]

@duglin
Copy link
Contributor

duglin commented Mar 15, 2017

I have no idea if that's super cool or super weird ;-)

@philtay
Copy link

philtay commented Mar 15, 2017

I agree that having the context accessible without copying it would be nice, but it's unrelated to the nested build problem. Another solution could be to have several directories, all of them rooted in /.context. i.e. /.context/0, /.context/1, /.context/2, etc. One for each build step. We can go on for days with things like that...

@duglin
Copy link
Contributor

duglin commented Mar 15, 2017

not sure why you keep saying its not related to the nested build issue, I think my example shows how I personally would use it to solve my nested build issue of wanting 1 build to build my exe and a 2nd build to put just that exe into a scratch image.

@philtay
Copy link

philtay commented Mar 15, 2017

It's because /.context, as you proposed it, basically gives access only to the current context. This is your example:

FROM ubuntu
RUN go build -o /bin/myapp /.context/myapp/src/*.go

FROM scratch
RUN cp /.context/bin/myapp /
CMD /myapp

In the first step /.context refers to the initial context (the .go files). During the second step the initial context is gone and /.context refers only to the previous image. It's not going to work in a multi-step build process. You must be able to access to any previous context (step) consistently and at will.

IMO /.context should be tied only to the current context, just to not having to copy from it. In some simple cases it could even help to avoid a multi-step build. Your example can be rewritten as:

FROM scratch
RUN go build -o /myapp /.context/myapp/src/*.go
CMD /myapp

As you can see you don't even need a multi-step build. The second FROM is gone. But that's material for another ticket. Not sure if are able to kill two birds with one stone.

@duglin
Copy link
Contributor

duglin commented Mar 15, 2017

Ah, I didn't realize you were so focused on wanting access to more than just the previous context. Then yea, we could do what you suggested in a previous comment and make them all available at some well known locations in the filesystem.

I like your tweaking of my example :-) I think having the files available during RUN could open up lots of nice options for people. Although, in that case I'm not sure it would work since 'go' isn't available in 'scratch'.

@philtay
Copy link

philtay commented Mar 15, 2017

wanting access to more than just the previous context

Yep, in a "true" multi-step build you can't lose access to the initial context or to the previous build steps. Otherwise you're basically forced to invalidate the cache.

since 'go' isn't available in 'scratch'

Ok, make it FROM alpine and RUN wget golang && tar golang && build /.context/*.go && rm -rf golang. And don't tell me you prefer scratch. The resulting compiled go binary it's going to be larger than the entire alpine image. Ahahah :)

@tonistiigi
Copy link
Member Author

I gathered more feedback from @dmcgowan @icecrime @dnephin @justincormack @simonferquel

Everyone seem to be OK with the new proposal. Instead of COPY $0/foo more people wanted explicit flag COPY --context=0.

Also, we also need to provide naming to the build blocks in addition to currently proposed incrementing numbers. Feel free to discuss syntaxes for that but it shouldn't block any work for getting the initial implementation merged.

@philtay
Copy link

philtay commented Mar 17, 2017

I propose the AS syntax.

FROM alpine AS mycontext
RUN ...
FROM alpine AS anothercontext
COPY --context=mycontext /foo /bar/
RUN ...
FROM scratch
COPY --context=mycontext /foo /bar/
COPY --context=anothercontext /zoo /moo/
RUN ...

EDIT
You can access the initial context in any build step simply omitting the --context flag in a COPY instruction (i.e. you don't need a name for the initial context).

Example:

...
...
FROM scratch
COPY --context=mycontext /foo /bar/
COPY --context=anothercontext /zoo /moo/
COPY /aaa /bbb/ # /aaa is in the initial context
RUN ...

@clayrisser
Copy link

How do you currently copy from a mult-stage build?

@clayrisser
Copy link

clayrisser commented Mar 21, 2017

I agree with the AS syntax, but I think it should be on its own line like all the other commands.

# build
#############################################
FROM ubuntu:latest
AS build
WORKDIR /build/
RUN apt-get update && apt-get install -y nodejs && \
    npm install -g gulp
COPY ./package.json /build/package.json
RUN npm install
COPY ./ /build/
RUN gulp

# final
##############################################
FROM alpine:latest
WORKDIR /app/
RUN apk add --no-cache nginx
COPY --context=build /build/dist/* /app/
COPY ./nginx.conf /etc/nginx/conf.d/default.conf
RUN chown -R nobody:nobody /app/
ENTRYPOINT ["/usr/sbin/nginx", "-g", "daemon off;"]

@Perlence
Copy link

Here's my take on @AkihiroSuda's suggestion, a proof of concept https://github.com/Perlence/docker-multi-build

@0xdevalias
Copy link

0xdevalias commented Mar 21, 2017

For my /2c, I think I prefer the AS syntax inline.. but perhaps it could be made to work both ways and leave it up to user style?

Main reason I don't like it being it's own line, is theoretically you could then make something like this (AS hidden halfway down the file), which would be hard to follow:

# build
#############################################
FROM ubuntu:latest

WORKDIR /build/
RUN apt-get update && apt-get install -y nodejs && \
    npm install -g gulp
COPY ./package.json /build/package.json
RUN npm install
COPY ./ /build/
AS build
RUN gulp

..etc..

@ctrlok
Copy link

ctrlok commented Mar 24, 2017

Maybe it can be useful: in rocker (~1.5 years ago) for that purpose we create IMPORT and EXPORT commands and it was pretty intuitive for all users.

FROM google/golang:1.4
ADD . /src
WORKDIR /src
RUN CGO_ENABLED=0 go build -a -installsuffix cgo -v -o rocker.o rocker.go
EXPORT rocker.o                    #1

FROM busybox
IMPORT rocker.o /bin/rocker        #2
CMD ["/bin/rocker"]

Maybe @ybogdanov can add more info.

@jmarcos-cano
Copy link

Is there any chance to specify stage you want to build, ala make file targets?

@AkihiroSuda
Copy link
Member

@jmarcos-cano docker build --target foo

@mkobit
Copy link

mkobit commented Aug 15, 2017

@jmarcos-cano you can use --target option for docker build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny
Projects
None yet
Development

Successfully merging a pull request may close this issue.