-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find the name of an image in an archive using a pattern #1208
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1208 +/- ##
============================================
+ Coverage 53.08% 53.33% +0.25%
- Complexity 1532 1595 +63
============================================
Files 150 153 +3
Lines 7985 8165 +180
Branches 1202 1249 +47
============================================
+ Hits 4239 4355 +116
- Misses 3328 3378 +50
- Partials 418 432 +14
|
2 similar comments
Codecov Report
@@ Coverage Diff @@
## master #1208 +/- ##
============================================
+ Coverage 53.08% 53.33% +0.25%
- Complexity 1532 1595 +63
============================================
Files 150 153 +3
Lines 7985 8165 +180
Branches 1202 1249 +47
============================================
+ Hits 4239 4355 +116
- Misses 3328 3378 +50
- Partials 418 432 +14
|
Codecov Report
@@ Coverage Diff @@
## master #1208 +/- ##
============================================
+ Coverage 53.08% 53.33% +0.25%
- Complexity 1532 1595 +63
============================================
Files 150 153 +3
Lines 7985 8165 +180
Branches 1202 1249 +47
============================================
+ Hits 4239 4355 +116
- Misses 3328 3378 +50
- Partials 418 432 +14
|
Codecov Report
@@ Coverage Diff @@
## master #1208 +/- ##
============================================
+ Coverage 53.05% 53.33% +0.27%
- Complexity 1532 1595 +63
============================================
Files 150 154 +4
Lines 7990 8166 +176
Branches 1202 1249 +47
============================================
+ Hits 4239 4355 +116
- Misses 3333 3379 +46
- Partials 418 432 +14
|
Thanks a lot for the PR and the very detailed explanation. Awesome work ! Let me try to answer your questions:
We should use the same kind of pattern matching for image and container names so that we can document that in a dedicated section in the documentation to which we then can refer to. Speaking of #900 this PR needs some rework anyway and should be adapted to the matching introduced here. This can happen in a later step so no worried. However, we should identify the contianer pattern matching spots and change them. So maybe we put than in a general
Absolutely agreed. I don't think we should think about future backwards compatibility right now as we can take care of if this should become actual.
Sounds good to me. It does not extract the whole tar, right ? Just looking for a certain file and extract that (haven't got time yet to jump in the code).
For consistencies sake we should add this to the PropertyConfigurationHandler like the other configuration options I have not fully understood the use of this PR together with |
@rohanKanojia If you have time, could you please help in reviewing this PR ? I'm a bit time contstrained (again :(, so could need some second eyes for a look. |
Thanks for the quick feedback! I have some answers and more questions 😅
Sorry, the title of the issue was misleading - it mentioned docker:save, but you're absolutely right - really this PR is directed at docker:build, just those instances of docker:build that load from an archive such as is produced by docker:save. I changed the issue title to match.
I think the pattern wildcarding proposed above would work for the kinds of container id expressions that are supported, and that the same kind of pattern would work well in docker:remove and docker:stop as a way of clearing out previously built images or stopping containers. I will move the matching routine to NameUtil or maybe NamePatternUtil.
I open a stream and scan through it in memory (TarArchiveInputStream getNextTarEntry() in a loop), until one named "manifest.json" shows up. I think this ought to skip over parts of the stream that are not needed, but probably since it is based on an InputStream it is still technically reading and decompressing the whole thing, just not all into memory at once or all into files. There don't seem to be routines for random access to tar archive entries, and at any rate the archives are probably compressed and need to be streamed through a decompressor anyway, making it hard to avoid streaming. The manifest.json tends to be one of the last entries in the archive, presumably because of how they are constructed by Docker.
Ah! So, without really knowing what I was doing, I did add some things that looked in keeping with other things to the PropertyConfigurationHandler and ConfigKey files. I should test those out and see if they actually work. |
No worries. There is actually some test for the PropertyConfigurationHandler and I think it's good enough to just add that property like the others and check whether the expected BuildConfiguration is created. All your suggestions look good to me. Thx! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, minor comments from my side
@@ -465,7 +465,7 @@ public void saveImage(String image, String filename, ArchiveCompression compress | |||
public Object handleResponse(HttpResponse response) throws IOException { | |||
try (InputStream stream = response.getEntity().getContent(); | |||
OutputStream out = compression.wrapOutputStream(new FileOutputStream(filename))) { | |||
IOUtils.copy(stream, out); | |||
IOUtils.copy(stream, out, 65536); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refactor these buffersizes as constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
import java.nio.charset.StandardCharsets; | ||
import java.util.LinkedHashMap; | ||
import java.util.Map; | ||
import java.util.StringTokenizer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused import ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops 😜
3d13878
to
4c9a334
Compare
I tried out the external property configuration and it seems to work fine. Using the following POM: <?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>io.fabric8</groupId>
<artifactId>docker-maven-plugin-issue</artifactId>
<version>PR-1207</version>
<packaging>pom</packaging>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<build>
<plugins>
<plugin>
<groupId>io.fabric8</groupId>
<artifactId>docker-maven-plugin</artifactId>
<version>0.29-SNAPSHOT</version>
<configuration>
<images>
<image>
<external>
<type>properties</type>
</external>
</image>
</images>
</configuration>
</plugin>
</plugins>
</build>
</project> I can run:
and the loadNamePattern is correctly used, triggering the archive scan just like it does if configured in the POM. I also moved the name pattern logic into its own class and made some updates to the documentation. I have not put in a dedicated section about the name patterns yet, as I wasn't sure where it made sense to add that in the documentation. It could go in the description of the loadNamePattern element under the docker:build Configuration section, or in the External Dockerfile or Docker archive section in the docker:build preamble, or even in section 4.1 along with the Image Names. Given this PR only applies the pattern logic to this one option, though, it seems appropriate to just document it for this option. When the other places that can use wildcards switch over, it will make sense to create a dedicated section in the docs. Please let me know if this is okay - I'm happy to add the content where you think it belongs. |
Sounds good to me. I'm happy to document that there first and I agree that when we have multiple usage of wildcards we can collect them in a dedicated section to refer to from the various places. btw, you can expect a next release (0.29.1 or 0.30.0) right after eastern (where I reserved some time to work on dmp). |
Signed-off-by: William Rose <[email protected]>
…hives. This PR implements the proposed pattern matching that scans the archive prior to loading and then creates a tag from the image name loaded from the archive to the image name configured in the Maven project. Signed-off-by: William Rose <[email protected]>
I expanded the documentation for the loadNamePattern attribute - I hope this can be merged! There will be a conflict with PR #1210 if you merge them one after the other - they are both based on master, and conflict in the changelog. I can fix whichever is going to go in second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot ! Looks good to me, I'm going to merge this now as I want to cut a release today.
However, for future work I would suggest to move all load functionality (dockerArchiver, loadImagePattern) to a dedicated "LoadMojo" with goal 'docker:load' as the current 'docker:build' feels to be quite overloaded with different functionalities.
Also, a real use case example (like in the description of this PR) in the documentation would be awesome. That would be another argument for a dedicated 'docker:load' goal as this is reflected directly in the documentation.
Wdyt ?
The work for this would not be huge:
- Introduce LoadMojo
- Rearrange documentation
- Move loading parts from BuildService to a LoadService
* `**` matches multiple components, stopping at the final colon | ||
* `**/` matches multiple components, but must stop at a slash, or the final colon | ||
|
||
When matching multiple components, `$$**/$$` is likely to be more useful than `$$**$$`. The pattern `$$**image-name:*$$` will match `my-group/my-image-name:some-tag`, whereas `$$**/image-name:*$$` will not, because the wildcard has to stop at a slash. Note that `$$**/image-name:*$$` will also match 'image-name:some-tag', since the `$$**/$$` wildcard can be empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy with it for now, but would favor to move it to an extra section (maybe right after the table) to which we add only a reference here. Otherwise it will blow up the table quite a bit.
Also, I think the explanation is a not 100% clear, so would love to see some more elaborate examples explainng why the one is more useful than the other. E.g. describing the use case. In the example above its not clear to me why **/ is preferable (maybe because I don't understand what you want to achieve ;).
BTW, I like that you don't reuse Maven intrinsics, as we should take care to add as less coupling as possible to Maven because one of the long term goals is to extract a Maven independent core which can b e reused in other contexts, too. See https://github.com/fabric8io/fabric8-kit for details. |
This is a work-in-progress to address #1207, which has the motivation for this PR. I'm opening a PR early to get feedback on the proposal and initial implementation before finalising and updating the documentation.
Basic Idea
Adding
<loadNamePattern>my-company/my-image:*</loadNamePattern>
to a<build>
section that specifies a<dockerArchive>
will cause the build mojo to read the archive before sending it to the Docker server and to find the manifest.json that describes the contents. Then it will try to parse the JSON, figure out the different images within the archive (there can be several) and their associated tags (also may be multiple per image), and from there will attempt to match the pattern against repo tags, stopping with the first image it finds.Having found a match, the mojo will continue with the load as before, but will follow up the load with a tag operation that applies the name configured in the image against the name that was matched from the archive. For instance, with a configuration snippet like:
If the project containing this configuration had
base.container.version=2.3.0-SNAPSHOT
, and the archive contained image tagregistry.company.com/my-company/base-container:2.3.0-b6b6bdd
, then after loading the plugin would execute the equivalent of:This would then allow simpler subsequent references as now the configured image name would match something on the Docker server.
Pattern Matching
I initially added two parameters - one with an exact string match for the name, and another for a Java-style regex. The idea was that literal matches created partly from variables are hard to quote into valid regexes, so having a parameter for literal matches and another for regexes seemed like a good idea.
After using it for a few days, it already seemed awkward and un-Maven-ish, since it is more common to use Ant-style glob patterns that can . However, using as-is Ant patterns, e.g. as in Maven SelectorUtils, there are a couple of problems:
Hence for this PR I have come up with a different convention that I hope is similar enough to be intuitive. The wild cards are:
So the pattern **/my-group/my-image:2.0.0* would match:
Interestingly I found that Maven's SelectorUtils let you swap to a pure Java regex by wrapping it in %regex[pattern], or force the Ant-style by wrapping it in %ant[], as described for test case matching. I have held onto that convention here: you can wrap a Java regex in %regex[] and it will get used verbatim, which helps because this lets you create patterns that are not anchored, for instance.
I put the code for this pattern matching along with other code in a new class ImageArchiveUtil, but as I was doing it, I thought it might be more generally applicable, and possibly belonged in the ImageName class. I see now that there have been other proposals for pattern matching (e.g. PR #900), and there is also some pattern matching for container names, but I'm not sure that there's an existing scheme for image names like this. Please let me know if this approach fits in okay!
Failures and warnings
It is possible to create an archive with the
docker save
on the command line that contains multiple images, and therefore possible to match multiple distinct image ids with a single pattern (**:*, for instance). However, since the docker:save mojo only creates archives with a single image, the current code doesn't fail if there are multiple images, it just uses the tag from the one it finds first in the archive, and logs a warning that there are multiple matches. This could be turned into an actual failure in all cases, or possibly via configuration, but that seems a bit excessive for what's probably a corner case.It is possible that none of the images in the archive match the pattern. In this case the mojo does fail, although it could potentially ignore the failure and proceed. The downside to proceeding is that subsequent operations might work because the server already had images loaded that didn't come from the archive, but which match the name in the image configuration, so then the build will mysteriously work sometimes and not other times. I think failing based on the result of scanning the archive makes sense, the only reason not to would be to try to guard against future format changes in the image archive manifest that might cause parsing bugs.
Forced tagging
When the code applies the tag after loading the image, it is hardcoded to force the tags. I think this is a more useful default, but again this could be made configurable somehow.
Scan before send
For simplicity, the code that reads the archive reads from the local file first, makes a decision, and only then sends the archive to the server. In theory it could try to watch the stream being sent to the server to avoid reading the archive twice, but as far as I can tell, reading archives locally is pretty fast (~2 seconds for a 500MB image), and given we may want to abort if the scan doesn't find the expected tags, it seems unhelpful to realise that midway through POSTing to the Docker server.
Scanning doesn't happen if the
<loadNamePattern>
is not set, so there is no penalty by default.System property overrides
I haven't added any system property for supplying the pattern, so this has to be enabled in the POM.
Documentation
Given the above, should I just search for mentions of
<dockerArchive>
in the documentation and add descriptions there? Or would you prefer it go somewhere else?