Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find the name of an image in an archive using a pattern #1208

Merged
merged 1 commit into from
Apr 21, 2019

Conversation

wrosenuance
Copy link
Contributor

This is a work-in-progress to address #1207, which has the motivation for this PR. I'm opening a PR early to get feedback on the proposal and initial implementation before finalising and updating the documentation.

Basic Idea

Adding <loadNamePattern>my-company/my-image:*</loadNamePattern> to a <build> section that specifies a <dockerArchive> will cause the build mojo to read the archive before sending it to the Docker server and to find the manifest.json that describes the contents. Then it will try to parse the JSON, figure out the different images within the archive (there can be several) and their associated tags (also may be multiple per image), and from there will attempt to match the pattern against repo tags, stopping with the first image it finds.

Having found a match, the mojo will continue with the load as before, but will follow up the load with a tag operation that applies the name configured in the image against the name that was matched from the archive. For instance, with a configuration snippet like:

<image>
  <name>my-company/base-container:${base.container.version}</name>
  <build>
    <dockerArchive>${project.build.directory}/dependencies/base-container-attachment.tgz</dockerArchive>
    <loadNamePattern>**/my-company/base-container:*</loadNamePattern>
  </build>
</image>

If the project containing this configuration had base.container.version=2.3.0-SNAPSHOT, and the archive contained image tag registry.company.com/my-company/base-container:2.3.0-b6b6bdd, then after loading the plugin would execute the equivalent of:

docker tag registry.company.com/my-company/base-container:2.3.0-b6b6bdd my-company/base-container:2.3.0-SNAPSHOT

This would then allow simpler subsequent references as now the configured image name would match something on the Docker server.

Pattern Matching

I initially added two parameters - one with an exact string match for the name, and another for a Java-style regex. The idea was that literal matches created partly from variables are hard to quote into valid regexes, so having a parameter for literal matches and another for regexes seemed like a good idea.

After using it for a few days, it already seemed awkward and un-Maven-ish, since it is more common to use Ant-style glob patterns that can . However, using as-is Ant patterns, e.g. as in Maven SelectorUtils, there are a couple of problems:

  • it assumes you want to split at File.separator, not always forward slash
  • it doesn't understand that the colon matters for tag, but not in a registry:port at the start

Hence for this PR I have come up with a different convention that I hope is similar enough to be intuitive. The wild cards are:

  • ? -- matches a single character
  • * -- matches one component, which is registry:port, or something between slashes, or something between the last slash and the tag, or the tag, or part thereof
  • ** -- matches multiple components, but will not cross from repository to tag (i.e. stops before the last colon)
  • **/ -- matches multiple components, and makes sure that whatever comes after is either at the start of the string or follows a slash.

So the pattern **/my-group/my-image:2.0.0* would match:

  • registry.company.com:8080/my-division/my-group/my-image:2.0.0-20190411-1808-b6b6bdd
  • registry.company.com:8080/my-group/my-image:2.0.0-20190411-1808
  • my-group/my-image:2.0.0

Interestingly I found that Maven's SelectorUtils let you swap to a pure Java regex by wrapping it in %regex[pattern], or force the Ant-style by wrapping it in %ant[], as described for test case matching. I have held onto that convention here: you can wrap a Java regex in %regex[] and it will get used verbatim, which helps because this lets you create patterns that are not anchored, for instance.

I put the code for this pattern matching along with other code in a new class ImageArchiveUtil, but as I was doing it, I thought it might be more generally applicable, and possibly belonged in the ImageName class. I see now that there have been other proposals for pattern matching (e.g. PR #900), and there is also some pattern matching for container names, but I'm not sure that there's an existing scheme for image names like this. Please let me know if this approach fits in okay!

Failures and warnings

It is possible to create an archive with the docker save on the command line that contains multiple images, and therefore possible to match multiple distinct image ids with a single pattern (**:*, for instance). However, since the docker:save mojo only creates archives with a single image, the current code doesn't fail if there are multiple images, it just uses the tag from the one it finds first in the archive, and logs a warning that there are multiple matches. This could be turned into an actual failure in all cases, or possibly via configuration, but that seems a bit excessive for what's probably a corner case.

It is possible that none of the images in the archive match the pattern. In this case the mojo does fail, although it could potentially ignore the failure and proceed. The downside to proceeding is that subsequent operations might work because the server already had images loaded that didn't come from the archive, but which match the name in the image configuration, so then the build will mysteriously work sometimes and not other times. I think failing based on the result of scanning the archive makes sense, the only reason not to would be to try to guard against future format changes in the image archive manifest that might cause parsing bugs.

Forced tagging

When the code applies the tag after loading the image, it is hardcoded to force the tags. I think this is a more useful default, but again this could be made configurable somehow.

Scan before send

For simplicity, the code that reads the archive reads from the local file first, makes a decision, and only then sends the archive to the server. In theory it could try to watch the stream being sent to the server to avoid reading the archive twice, but as far as I can tell, reading archives locally is pretty fast (~2 seconds for a 500MB image), and given we may want to abort if the scan doesn't find the expected tags, it seems unhelpful to realise that midway through POSTing to the Docker server.

Scanning doesn't happen if the <loadNamePattern> is not set, so there is no penalty by default.

System property overrides

I haven't added any system property for supplying the pattern, so this has to be enabled in the POM.

Documentation

Given the above, should I just search for mentions of <dockerArchive> in the documentation and add descriptions there? Or would you prefer it go somewhere else?

@codecov
Copy link

codecov bot commented Apr 12, 2019

Codecov Report

Merging #1208 into master will increase coverage by 0.25%.
The diff coverage is 64.13%.

@@             Coverage Diff              @@
##             master    #1208      +/-   ##
============================================
+ Coverage     53.08%   53.33%   +0.25%     
- Complexity     1532     1595      +63     
============================================
  Files           150      153       +3     
  Lines          7985     8165     +180     
  Branches       1202     1249      +47     
============================================
+ Hits           4239     4355     +116     
- Misses         3328     3378      +50     
- Partials        418      432      +14
Impacted Files Coverage Δ Complexity Δ
...ven/docker/access/hc/DockerAccessWithHcClient.java 14.23% <0%> (ø) 13 <0> (ø) ⬇️
...rc/main/java/io/fabric8/maven/docker/SaveMojo.java 24.24% <0%> (-0.76%) 12 <0> (ø)
...abric8/maven/docker/config/ArchiveCompression.java 61.9% <0%> (-10.32%) 8 <0> (ø)
...aven/docker/config/handler/property/ConfigKey.java 100% <100%> (ø) 10 <0> (ø) ⬇️
...config/handler/property/PropertyConfigHandler.java 75% <100%> (+0.1%) 120 <0> (+1) ⬆️
...8/maven/docker/config/BuildImageConfiguration.java 84.69% <100%> (+0.25%) 72 <1> (+1) ⬆️
.../io/fabric8/maven/docker/service/BuildService.java 47.39% <13.04%> (-11.39%) 20 <1> (+1)
...docker/model/ImageArchiveManifestEntryAdapter.java 75% <75%> (ø) 12 <12> (?)
...aven/docker/model/ImageArchiveManifestAdapter.java 83.33% <83.33%> (ø) 5 <5> (?)
...io/fabric8/maven/docker/util/ImageArchiveUtil.java 87.23% <87.23%> (ø) 43 <43> (?)
... and 3 more

2 similar comments
@codecov
Copy link

codecov bot commented Apr 12, 2019

Codecov Report

Merging #1208 into master will increase coverage by 0.25%.
The diff coverage is 64.13%.

@@             Coverage Diff              @@
##             master    #1208      +/-   ##
============================================
+ Coverage     53.08%   53.33%   +0.25%     
- Complexity     1532     1595      +63     
============================================
  Files           150      153       +3     
  Lines          7985     8165     +180     
  Branches       1202     1249      +47     
============================================
+ Hits           4239     4355     +116     
- Misses         3328     3378      +50     
- Partials        418      432      +14
Impacted Files Coverage Δ Complexity Δ
...ven/docker/access/hc/DockerAccessWithHcClient.java 14.23% <0%> (ø) 13 <0> (ø) ⬇️
...rc/main/java/io/fabric8/maven/docker/SaveMojo.java 24.24% <0%> (-0.76%) 12 <0> (ø)
...abric8/maven/docker/config/ArchiveCompression.java 61.9% <0%> (-10.32%) 8 <0> (ø)
...aven/docker/config/handler/property/ConfigKey.java 100% <100%> (ø) 10 <0> (ø) ⬇️
...config/handler/property/PropertyConfigHandler.java 75% <100%> (+0.1%) 120 <0> (+1) ⬆️
...8/maven/docker/config/BuildImageConfiguration.java 84.69% <100%> (+0.25%) 72 <1> (+1) ⬆️
.../io/fabric8/maven/docker/service/BuildService.java 47.39% <13.04%> (-11.39%) 20 <1> (+1)
...docker/model/ImageArchiveManifestEntryAdapter.java 75% <75%> (ø) 12 <12> (?)
...aven/docker/model/ImageArchiveManifestAdapter.java 83.33% <83.33%> (ø) 5 <5> (?)
...io/fabric8/maven/docker/util/ImageArchiveUtil.java 87.23% <87.23%> (ø) 43 <43> (?)
... and 3 more

@codecov
Copy link

codecov bot commented Apr 12, 2019

Codecov Report

Merging #1208 into master will increase coverage by 0.25%.
The diff coverage is 64.13%.

@@             Coverage Diff              @@
##             master    #1208      +/-   ##
============================================
+ Coverage     53.08%   53.33%   +0.25%     
- Complexity     1532     1595      +63     
============================================
  Files           150      153       +3     
  Lines          7985     8165     +180     
  Branches       1202     1249      +47     
============================================
+ Hits           4239     4355     +116     
- Misses         3328     3378      +50     
- Partials        418      432      +14
Impacted Files Coverage Δ Complexity Δ
...ven/docker/access/hc/DockerAccessWithHcClient.java 14.23% <0%> (ø) 13 <0> (ø) ⬇️
...rc/main/java/io/fabric8/maven/docker/SaveMojo.java 24.24% <0%> (-0.76%) 12 <0> (ø)
...abric8/maven/docker/config/ArchiveCompression.java 61.9% <0%> (-10.32%) 8 <0> (ø)
...aven/docker/config/handler/property/ConfigKey.java 100% <100%> (ø) 10 <0> (ø) ⬇️
...config/handler/property/PropertyConfigHandler.java 75% <100%> (+0.1%) 120 <0> (+1) ⬆️
...8/maven/docker/config/BuildImageConfiguration.java 84.69% <100%> (+0.25%) 72 <1> (+1) ⬆️
.../io/fabric8/maven/docker/service/BuildService.java 47.39% <13.04%> (-11.39%) 20 <1> (+1)
...docker/model/ImageArchiveManifestEntryAdapter.java 75% <75%> (ø) 12 <12> (?)
...aven/docker/model/ImageArchiveManifestAdapter.java 83.33% <83.33%> (ø) 5 <5> (?)
...io/fabric8/maven/docker/util/ImageArchiveUtil.java 87.23% <87.23%> (ø) 43 <43> (?)
... and 3 more

@codecov
Copy link

codecov bot commented Apr 12, 2019

Codecov Report

Merging #1208 into master will increase coverage by 0.27%.
The diff coverage is 65.19%.

@@             Coverage Diff              @@
##             master    #1208      +/-   ##
============================================
+ Coverage     53.05%   53.33%   +0.27%     
- Complexity     1532     1595      +63     
============================================
  Files           150      154       +4     
  Lines          7990     8166     +176     
  Branches       1202     1249      +47     
============================================
+ Hits           4239     4355     +116     
- Misses         3333     3379      +46     
- Partials        418      432      +14
Impacted Files Coverage Δ Complexity Δ
...ven/docker/access/hc/DockerAccessWithHcClient.java 14.23% <0%> (ø) 13 <0> (ø) ⬇️
...abric8/maven/docker/config/ArchiveCompression.java 61.9% <0%> (ø) 8 <0> (ø) ⬇️
...config/handler/property/PropertyConfigHandler.java 75% <100%> (+0.1%) 120 <0> (+1) ⬆️
...aven/docker/config/handler/property/ConfigKey.java 100% <100%> (ø) 10 <0> (ø) ⬇️
...8/maven/docker/config/BuildImageConfiguration.java 84.69% <100%> (+0.25%) 72 <1> (+1) ⬆️
.../io/fabric8/maven/docker/service/BuildService.java 47.39% <13.04%> (-11.39%) 20 <1> (+1)
...docker/model/ImageArchiveManifestEntryAdapter.java 75% <75%> (ø) 12 <12> (?)
...aven/docker/model/ImageArchiveManifestAdapter.java 83.33% <83.33%> (ø) 5 <5> (?)
.../io/fabric8/maven/docker/util/NamePatternUtil.java 84.61% <84.61%> (ø) 12 <12> (?)
...io/fabric8/maven/docker/util/ImageArchiveUtil.java 86.95% <86.95%> (ø) 31 <31> (?)
... and 4 more

@rhuss
Copy link
Collaborator

rhuss commented Apr 12, 2019

Thanks a lot for the PR and the very detailed explanation. Awesome work !

Let me try to answer your questions:

I see now that there have been other proposals for pattern matching (e.g. PR #900), and there is also some pattern matching for container names, but I'm not sure that there's an existing scheme for image names like this. Please let me know if this approach fits in okay!

We should use the same kind of pattern matching for image and container names so that we can document that in a dedicated section in the documentation to which we then can refer to. Speaking of #900 this PR needs some rework anyway and should be adapted to the matching introduced here. This can happen in a later step so no worried. However, we should identify the contianer pattern matching spots and change them. So maybe we put than in a general NameUtil or so ? Depends of course whether we really need to deal with conainer id matching, otherwise I thing either ImageName itself or an ImageNameUtil would make more sense.

I think failing based on the result of scanning the archive makes sense

Absolutely agreed. I don't think we should think about future backwards compatibility right now as we can take care of if this should become actual.

For simplicity, the code that reads the archive reads from the local file first, makes a decision, and only then sends the archive to the server.

Sounds good to me. It does not extract the whole tar, right ? Just looking for a certain file and extract that (haven't got time yet to jump in the code).

I haven't added any system property for supplying the pattern, so this has to be enabled in the POM.

For consistencies sake we should add this to the PropertyConfigurationHandler like the other configuration options


I have not fully understood the use of this PR together with docker:save: I thought its a about loading and re-tagging an existing image archive and sending it to a Docker daemon. For docker:save its about storing to a local file where the name is taken from the image configuratin. The only use case I see here is that docker:save picks up a local file archive, picks a matching image from there and directly stores it again in the local filesystem, with the pom-configured name. So essentially a kind of retagging within tar-archives. Is this a valid use case ?

@rhuss
Copy link
Collaborator

rhuss commented Apr 12, 2019

@rohanKanojia If you have time, could you please help in reviewing this PR ? I'm a bit time contstrained (again :(, so could need some second eyes for a look.

@wrosenuance
Copy link
Contributor Author

Thanks for the quick feedback! I have some answers and more questions 😅

I have not fully understood the use of this PR together with docker:save: I thought its a about loading and re-tagging an existing image archive and sending it to a Docker daemon. For docker:save its about storing to a local file where the name is taken from the image configuratin.

Sorry, the title of the issue was misleading - it mentioned docker:save, but you're absolutely right - really this PR is directed at docker:build, just those instances of docker:build that load from an archive such as is produced by docker:save. I changed the issue title to match.

So maybe we put than in a general NameUtil or so ? Depends of course whether we really need to deal with conainer id matching

I think the pattern wildcarding proposed above would work for the kinds of container id expressions that are supported, and that the same kind of pattern would work well in docker:remove and docker:stop as a way of clearing out previously built images or stopping containers. I will move the matching routine to NameUtil or maybe NamePatternUtil.

It does not extract the whole tar, right ? Just looking for a certain file and extract that

I open a stream and scan through it in memory (TarArchiveInputStream getNextTarEntry() in a loop), until one named "manifest.json" shows up. I think this ought to skip over parts of the stream that are not needed, but probably since it is based on an InputStream it is still technically reading and decompressing the whole thing, just not all into memory at once or all into files.

There don't seem to be routines for random access to tar archive entries, and at any rate the archives are probably compressed and need to be streamed through a decompressor anyway, making it hard to avoid streaming. The manifest.json tends to be one of the last entries in the archive, presumably because of how they are constructed by Docker.

For consistencies sake we should add this to the PropertyConfigurationHandler like the other configuration options

Ah! So, without really knowing what I was doing, I did add some things that looked in keeping with other things to the PropertyConfigurationHandler and ConfigKey files. I should test those out and see if they actually work.

@rhuss
Copy link
Collaborator

rhuss commented Apr 12, 2019

Ah! So, without really knowing what I was doing, I did add some things that looked in keeping with other things to the PropertyConfigurationHandler and ConfigKey files. I should test those out and see if they actually work.

No worries. There is actually some test for the PropertyConfigurationHandler and I think it's good enough to just add that property like the others and check whether the expected BuildConfiguration is created.

All your suggestions look good to me. Thx!

@rohanKanojia rohanKanojia added the WIP Work in Progress label Apr 12, 2019
Copy link
Member

@rohanKanojia rohanKanojia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, minor comments from my side

@@ -465,7 +465,7 @@ public void saveImage(String image, String filename, ArchiveCompression compress
public Object handleResponse(HttpResponse response) throws IOException {
try (InputStream stream = response.getEntity().getContent();
OutputStream out = compression.wrapOutputStream(new FileOutputStream(filename))) {
IOUtils.copy(stream, out);
IOUtils.copy(stream, out, 65536);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refactor these buffersizes as constants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

import java.nio.charset.StandardCharsets;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.StringTokenizer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused import ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops 😜

@rohanKanojia rohanKanojia requested a review from rhuss April 12, 2019 11:02
@wrosenuance wrosenuance force-pushed the issue-1207 branch 2 times, most recently from 3d13878 to 4c9a334 Compare April 17, 2019 23:56
@wrosenuance
Copy link
Contributor Author

I tried out the external property configuration and it seems to work fine. Using the following POM:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>io.fabric8</groupId>
    <artifactId>docker-maven-plugin-issue</artifactId>
    <version>PR-1207</version>

    <packaging>pom</packaging>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>
    
    
    <build>
        <plugins>
            <plugin>
                <groupId>io.fabric8</groupId>
                <artifactId>docker-maven-plugin</artifactId>
                <version>0.29-SNAPSHOT</version>
                <configuration>
                    <images>
                        <image>
                            <external>
                                <type>properties</type>
                            </external>
                        </image>
                    </images>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

I can run:

mvn docker:build -Ddocker.dockerArchive=/path/to/docker-archive.tar.gz -Ddocker.name=my-project/image-name:project-tag -Ddocker.loadNamePattern=\*\*:\*

and the loadNamePattern is correctly used, triggering the archive scan just like it does if configured in the POM.

I also moved the name pattern logic into its own class and made some updates to the documentation.

I have not put in a dedicated section about the name patterns yet, as I wasn't sure where it made sense to add that in the documentation. It could go in the description of the loadNamePattern element under the docker:build Configuration section, or in the External Dockerfile or Docker archive section in the docker:build preamble, or even in section 4.1 along with the Image Names.

Given this PR only applies the pattern logic to this one option, though, it seems appropriate to just document it for this option. When the other places that can use wildcards switch over, it will make sense to create a dedicated section in the docs.

Please let me know if this is okay - I'm happy to add the content where you think it belongs.

@rhuss
Copy link
Collaborator

rhuss commented Apr 18, 2019

Given this PR only applies the pattern logic to this one option, though, it seems appropriate to just document it for this option. When the other places that can use wildcards switch over, it will make sense to create a dedicated section in the docs.

Sounds good to me. I'm happy to document that there first and I agree that when we have multiple usage of wildcards we can collect them in a dedicated section to refer to from the various places.

btw, you can expect a next release (0.29.1 or 0.30.0) right after eastern (where I reserved some time to work on dmp).

wrosenuance added a commit to wrosenuance/docker-maven-plugin that referenced this pull request Apr 18, 2019
…hives.

This PR implements the proposed pattern matching that scans the archive
prior to loading and then creates a tag from the image name loaded from
the archive to the image name configured in the Maven project.

Signed-off-by: William Rose <[email protected]>
@wrosenuance
Copy link
Contributor Author

I expanded the documentation for the loadNamePattern attribute - I hope this can be merged!

There will be a conflict with PR #1210 if you merge them one after the other - they are both based on master, and conflict in the changelog. I can fix whichever is going to go in second.

Copy link
Collaborator

@rhuss rhuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot ! Looks good to me, I'm going to merge this now as I want to cut a release today.

However, for future work I would suggest to move all load functionality (dockerArchiver, loadImagePattern) to a dedicated "LoadMojo" with goal 'docker:load' as the current 'docker:build' feels to be quite overloaded with different functionalities.

Also, a real use case example (like in the description of this PR) in the documentation would be awesome. That would be another argument for a dedicated 'docker:load' goal as this is reflected directly in the documentation.

Wdyt ?

The work for this would not be huge:

  • Introduce LoadMojo
  • Rearrange documentation
  • Move loading parts from BuildService to a LoadService

* `**` matches multiple components, stopping at the final colon
* `**/` matches multiple components, but must stop at a slash, or the final colon

When matching multiple components, `$$**/$$` is likely to be more useful than `$$**$$`. The pattern `$$**image-name:*$$` will match `my-group/my-image-name:some-tag`, whereas `$$**/image-name:*$$` will not, because the wildcard has to stop at a slash. Note that `$$**/image-name:*$$` will also match 'image-name:some-tag', since the `$$**/$$` wildcard can be empty.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy with it for now, but would favor to move it to an extra section (maybe right after the table) to which we add only a reference here. Otherwise it will blow up the table quite a bit.

Also, I think the explanation is a not 100% clear, so would love to see some more elaborate examples explainng why the one is more useful than the other. E.g. describing the use case. In the example above its not clear to me why **/ is preferable (maybe because I don't understand what you want to achieve ;).

@rhuss
Copy link
Collaborator

rhuss commented Apr 21, 2019

BTW, I like that you don't reuse Maven intrinsics, as we should take care to add as less coupling as possible to Maven because one of the long term goals is to extract a Maven independent core which can b e reused in other contexts, too. See https://github.com/fabric8io/fabric8-kit for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants