Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for named build stages #32063

Merged
merged 4 commits into from
Apr 4, 2017
Merged

Conversation

tonistiigi
Copy link
Member

@tonistiigi tonistiigi commented Mar 24, 2017

follow-up to #31257

#31257 added support for copying data from other build stages by an incrementing ID. While this works fine in cases where you want to make least changes to existing Dockerfiles, it has the following problems:

  • Numeric ID doesn't provide context for the reader about what is actually copied
  • When new build stages are added all the numbers may need to be changed.

This PR lets the user to optionally give a name the build stage. Then afterward this name can be used in COPY --from=name src dest and FROM name. If a build stage is defined with that name it takes precedence in these commands, if it is not found, an image with that name is attempted to be used instead. So there is no need to write FROM foo as foo if data from an image is needed directly.

Examples:

FROM ubuntu AS build-env
RUN apt-get install make
ADD . /src
RUN cd /src && make

FROM busybox
COPY --from=build-env /src/build/app /usr/local/bin/app
EXPOSE 80
ENTRYPOINT /usr/local/bin/app
from debian as build-essential
arg APT_MIRROR
run apt-get update
run apt-get install -y make gcc
workdir /src

from build-essential as foo
copy src1 .
run make

from build-essential as bar
copy src2 .
run make

from alpine
copy --from=foo bin1 .
copy --from=bar bin2 .
cmd ...

@tiborvass @dnephin @dmcgowan @simonferquel @philtay

if len(args) != 1 {
ctxName := ""
if len(args) == 3 && strings.EqualFold(args[1], "as") {
ctxName = strings.ToLower(args[2])
Copy link
Contributor

@tiborvass tiborvass Mar 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi why not keep case-sensitivity for the context name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it confusing when foo and FOO can mean different things? Dockerfile commands for example are not case sensitive.

if len(args) == 3 && strings.EqualFold(args[1], "as") {
ctxName = strings.ToLower(args[2])
if ok, _ := regexp.MatchString("^[a-z][a-z0-9-_\\.]*$", ctxName); !ok {
return errors.Errorf("invalid name for FROM %s", ctxName)
Copy link
Contributor

@tiborvass tiborvass Mar 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explain what a valid name is. Also shouldn't we mention the word context since it's a context name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should call it "build stage name". Regular user don't know what a context is.

// mountByRef creates an imageMount from a reference. pulling the image if needed.
func mountByRef(b *Builder, name string) (*imageMount, error) {
var image builder.Image
if !b.options.PullParent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dockerfile := `
FROM busybox
COPY --from=busybox /etc/passwd /mypasswd
RUN cmp /etc/passwd /mypasswd`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testRequires(c, DaemonIsLinux)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a case with different file for windows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi I made certain restrictions for Windows in #32084 that we did not thought of previously. Basically, I made it mandatory for source paths to be both absolute and drive qualified.
Thus COPY --from=0 /test test won't work but COPY --from=0 c:/test test will.
I think I can soften the rule a little (authorize relative paths if they contain at least one path segment and this segment is not windows). That would make it easier to write x-plat tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, I changed the restrictions to be a bit smarter about this

Copy link
Contributor

@alexellis alexellis Mar 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi seeing as COPY --from is not case-sensitive, is it worth covering that in a test to further document/prove? I read the PR but didn't see one..

Edit: maybe a mixed-case "as" with flat copy --from and a flat as with a mixed copy --from

@simonferquel
Copy link
Contributor

LGTM

@dmcgowan
Copy link
Member

design LGTM

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design LGTM 🦁

@alexellis
Copy link
Contributor

What happens to the layers in the intermediate or final image if you need do copy --from for several locations like /dir1 /dir2? Does that result in one layer per copy --from?

@tonistiigi
Copy link
Member Author

@alexellis Every COPY instruction creates a single layer(you can have multiple source paths). Nothing happens to the source/intermediate image, it keeps its original layers for build cache. If the files marked as source path were in separate layers in the original image they will get squashed.

@alexellis
Copy link
Contributor

Thanks for the explanation 👍

@tiborvass
Copy link
Contributor

I'm +1 for this. @duglin thoughts?

@duglin
Copy link
Contributor

duglin commented Mar 24, 2017

Can we please see the larger design doc/plans for all of these Dockerfile changes?

I find it a little odd that such big changes are being proposed, and adopted relatively quickly, and yet something as trivial as INCLUDE has been blocked for ages.

I can't help but feel like this is all leading up to something but we're not being told what. And yes, I'm having flashbacks to the v1.12 days where leading up to DockerCon swarmkit was added in a similar way.

I'd just like to understand the bigger picture because in isolation some of these changes feel like they're being rushed through with minimal consideration - at least when compared with far smaller changes - and its not clear to me how they all relate to one another and it would a lot easier to evaluate each one if we knew the end game.

@dnephin
Copy link
Member

dnephin commented Mar 24, 2017

In general I like this, with one exception:

if it is not found, an image with that name is attempted to be used instead.

I don't like this behaviour. I don't think it's asking too much to have to define images up-front in a FROM line before they get used in a COPY --from. It makes the behaviour easier to explain, and it doesn't mix namespaces (context names with images).

If images always have to be defined on a FROM line it becomes really easy to determine which images will be used by a build. If COPY --from can be an image or a context name it becomes more involved.

@tiborvass
Copy link
Contributor

@duglin

Can we please see the larger design doc/plans for all of these Dockerfile changes?

This was the plan: #31067
You participated in it and we used one of your ideas: #31067 (comment)

This current PR simply makes a change to make it more user-friendly by using names instead of numbers.

And there's a follow-up proposal here #32100 on which it would be great if you could participate.

I find it a little odd that such big changes are being proposed, and adopted relatively quickly, and yet something as trivial as INCLUDE has been blocked for ages.

The INCLUDE PR is only trivial in its implementation, it has a lot of design issues that have been mentioned in the past on the PR itself (mainly that it encourages broken snippets). I agree with you we didn't do a good job processing it quickly enough, I apologize for that. I do want us to address all usecases there. I also hope that some of them will be resolved by #31067.

In fact, I pinged you on this PR because I value your opinion given your involvement.

I can't help but feel like this is all leading up to something but we're not being told what. And yes, I'm having flashbacks to the v1.12 days where leading up to DockerCon swarmkit was added in a similar way.

I understand why you feel that way, but I guarantee you that DockerCon timing is coincidental. The chained/nested builds PR was something many people wanted and it seemed to me there was design agreement. If you strongly disagree with the PR merged, we're happy to discuss the points. Again, it was my understanding you were in agreement.

@tonistiigi
Copy link
Member Author

@dnephin I think it is quite nice that --from and FROM have same semantics now. In both cases the local name takes precedence and pulled image is used as a backup. I also think sharing content with images is something we should encourage and not make unnecessarily hard. You mention mixing context names and images, but I think it is confusing to call them context names and these are not the terms most users think in. AS defines a name for the image that is being built with the next instructions, so it is still an image name just no such reference exists after builder has finished.

@duglin
Copy link
Contributor

duglin commented Mar 25, 2017

@tiborvass its not really a question of whether I agree or disagree with the direction of these PRs - to be honest, I've been so busy recently I feel bad that I haven't had a chance to really dig deep into them to know if I like them or not. Yes I do agree that having nested builds in general is good, but I can't say if the current approach is one I agree with or not.

However, notice that as soon as we merged that PR another was immediately opened to "fix" the UX of it - showing it really should have been vetted more before it was merged. And then we have some follow-on PRs to add even more in this space (like INSTALL/EXPORT) - which would have been good to discuss together to fully explore the implications of them being used together. I would really prefer more time to analyze these PRs and in particular have them put under the "experimental" flag since there are very large fundamental design discussions/reviews and "playing with" that should happen before we claim its GA ready.

Net (and yes a bit of venting, sorry): IMO some of the frustration coming from the community is related to situations like this. The INCLUDE PR is just a good example to pick on, and not the only one, but there appears to be what I would call "selective complexity syndrome". Something as trivial as a syntax INCLUDE (which is far from a new concept) has "complexity" while others, that are clearly more complex in all ways, seem to sail thru - especially when in the past those very same concepts were rejected for some mysterious reason - but I'm sure people can speculate why. :-)

But I digress, and sorry for that, I'll try to find time to review these (and the merged PRs too) soon.

@duglin
Copy link
Contributor

duglin commented Mar 25, 2017

@tonistiigi I agree with your comments about --from and FROM having the same semantics - treating them both as image names is a nice consistency. One additional thought: people have asked for the ability to name images from within the Dockerfile. Right now people can do so with AS but its scope is limited to just this build. Perhaps we could consider allowing these AS names to be real and live beyond the scope of the build.

Whether we do it via the AS flag, or introduce a new TAG (or NAME) type of command doesn't matter much - although with TAG people can specify more than one, which could be handy, especially when people use variables in the name - so they could name it with multiple tags (e.g. "latest" and a specific version number). Actually, with TAG it almost makes AS unnecessary and we could revert FROM to its old, less complex :-) , syntax. Just a thought...

@tonistiigi
Copy link
Member Author

@duglin Yes, the build definition should not try to attempt to set a reference for an image and that's why TAG has been rejected from Dockerfile. Ideally, build definition should be a function from immutable sources to an opaque result without any side effects. Tagging a result is an extra optional step that can happen after that. The name in AS defines a name that is local to the Dockerfile. Every file can define its own build stage names and they will never leak outside of the build process or be committed to any of the images. You can use name build-env in any of the images that need the simple "build and distribute with small image" model and they will never collide. Tag(reference) is a pointer to a single instance of an object that defines the current discovery or signing authority for that instance. I do think there may be a use case for tagging the intermediate images and build stage name is a good way how to define that connection, more on that in #32104.

@tonistiigi
Copy link
Member Author

A testcase was added for trusted build case.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👼

@vieux vieux merged commit 2e96c17 into moby:master Apr 4, 2017
@GordonTheTurtle GordonTheTurtle added this to the 17.05.0 milestone Apr 4, 2017
@0xdevalias 0xdevalias mentioned this pull request Apr 9, 2017
@sheerun
Copy link

sheerun commented Apr 9, 2017

@tonistiigi Is it possible to build specific image with cli argument? Something like:

docker build -t sheerun/app_runtime:latest -i foo .

Where -i foo is the name of named in image from Dockerfile like in from build-essential as foo

Use case: to export only (non-final) development image from Dockerfile to test it

@tonistiigi
Copy link
Member Author

@sheerun Not yet, there is a proposal for it in #32104

@WhisperingChaos
Copy link
Contributor

My negative vote for stage names concerns the harmful coupling it encourages when used with COPY --from. However, it's ability to create an inline image definition is a worthwhile feature.

I've a few questions regarding this feature's implementation:

  1. Can more than one AS be specified in a Dockerfile with the same name? Since AS implements a defined precedence mechanism, favoring stage names local to the current Dockerfile over image names visible to the Engine, does similar behavior apply to duplicate stage names within a Dockerfile?
    One reason for supporting multiple stages with the same name is the ability to "dynamically" rebind behavior within a Dockerfile, which is exactly the same reason for preferring Dockerfile stage names over Engine visible image names.

  2. Can stage names include a tag or be specified as an image ID?

  3. Is there a documentation effort to inform developers regarding the effects of stage name precedence and the ability to purposely/unintentionally craft a Dockerfile to "replace" an existing image via the --target option?

@tonistiigi
Copy link
Member Author

Can more than one AS be specified in a Dockerfile with the same name?

No, you can't have 2 stages with same name. The order of the stages will become irrelevant in the future, the names are only for defining what stage depends on another stage. The execution order will be based on these dependencies not based on the instruction order in Dockerfile.

Can stage names include a tag or be specified as an image ID?

In --from you can use tags and image IDs. Image IDs are discouraged as that means the target can't be pulled. For build stage names you can only use regular names. These names are only local identifiers while the builder runs, not to be confused with tags.

Is there a documentation effort to inform developers regarding the effects of stage name precedence and the ability to purposely/unintentionally craft a Dockerfile to "replace" an existing image via the --target option?

Indeed, it seems that we forgot to document --target #33143 . Not sure I understand the "replace" part though.

@WhisperingChaos
Copy link
Contributor

For build stage names you can only use regular names. These names are only local identifiers while the builder runs, not to be confused with tags.

Therefore, the algorithm resolving --from name references assumes the tag of :latest when a matching stage name cannot be found within the Dockerfile and the format of the reference is a simple name - not an ID/name:tag.

Ex:

FROM scratch
# since 'image_name' isn't a stage name, then 'image_name:latest'
# is used to search for an Engine visible image.
COPY --from image_name /bin /bin

If the above understanding is correct, then it seems the resolution algorithm limited itself to overriding only images assigned the :latest tag but not other tag values. In other words, if "image_name" above existed as a stage and "image_name:lastest" existed before running the Dockerfile, the resolution algorithm would have selected the image definition associated to the stage.

Ex:

# image called "image_name:latest" exists in registry local to the Engine. 
FROM alpine AS image_name
FROM scratch
# since 'image_name' stage name exists, then the alpine image definition
# is used instead of Engine visible image called 'image_name:latest'
COPY --from image_name /bin /bin

Again, if the above correctly describes the behavior of resolving names, then why limit the ability to override an image with an inline definition to only :lastest?

Not sure I understand the "replace" part though.

From what I could glean from its description and associated thread, I though

docker build --target=my_stage_name .

would produce an image with the name "my_stage_name:lastest", if "my_stage_name" existed as a stage name in the Dockerfile. If so, then change "my_stage_name" to "alpine" which essentially replaces the Engine copy of "alpine:latest" with the image generated from the definition located within the Dockerfile and associated to the "alpine" stage name.

I'm neutral to --target because, if the above understanding is correct, it can induce chaos into the build system, as encapsulated build processes can escape their Dockerfile isolation and affect other other unrelated builds.

@dnephin
Copy link
Member

dnephin commented May 10, 2017

It is best practice to always use a tag with an image name, even if you want latest, COPY --from image_name:latest works. The default-to-latest has to remain for backwards compatibility, but if anyone is concerned about it, they can be explicit about the tag. : is not valid in a stage name, so there isn't any ambiguity.

would produce an image with the name "my_stage_name:lastest"

Nope, stage names are not image tags. They only exist within the context of a Dockerfile and are not persisted anywhere.

You still have to use -t with --target:

docker build --target=my_stage_name -t my_image_name:andtag .

@WhisperingChaos
Copy link
Contributor

@dnephin

Thanks for clarifying --target via your example.

Concerning my question:

why limit the ability to override an image with an inline definition to only :lastest?

your reply:

It is best practice to always use a tag with an image name, ...

For me, the reason boils down to a stage can only override the behavior of an image tagged with :latest due to the Docker convention that assumes :latest when processing an image name devoid of a tag. Therefore, the ability to override an external image definition, whose name is tagged with :latest, within a Dockerfile is an artifact of convention and not a generally desired behavior.

I find this ability to override an existing image definition, external to the Dockerfile, potentially useful, as one could perhaps test the behavior of a new version of an external image by coding it internally within the Dockerfile and specifying a stage name that matches the external image name. It may still prove useful even when limited to external images tagged with :latest

@dnephin
Copy link
Member

dnephin commented May 10, 2017

Build stages are not about overriding image names. They are about providing a local name for a stage.

If you really want to override an image with a tag, you can always create a "single instruction build stage" for it:

FROM alpine:3.5 as my_stage_name

FROM somethingelse
...
COPY --from=my_stage_name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.