-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add skopeo-based adapter for working with OCI images #136
Conversation
Codecov Report
@@ Coverage Diff @@
## master #136 +/- ##
==========================================
- Coverage 86.54% 78.79% -7.76%
==========================================
Files 17 22 +5
Lines 929 1193 +264
==========================================
+ Hits 804 940 +136
- Misses 125 253 +128
Continue to review full report at Codecov.
|
just a note (sorry if I am repeating what is already known etc): Installing "yet another tool" (which is not automagically installed from pypi) could challenge some users. I wondered if we could somehow simplify deployment of skopeo... but it seems to rely on a handful of libraries $> ldd /usr/bin/skopeo
linux-vdso.so.1 (0x00007fff1295c000)
libgpgme.so.11 => /lib/x86_64-linux-gnu/libgpgme.so.11 (0x00007fd9aa6c6000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd9aa6a4000)
libdevmapper.so.1.02.1 => /lib/x86_64-linux-gnu/libdevmapper.so.1.02.1 (0x00007fd9aa637000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd9aa472000)
libassuan.so.0 => /lib/x86_64-linux-gnu/libassuan.so.0 (0x00007fd9aa45c000)
libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007fd9aa436000)
/lib64/ld-linux-x86-64.so.2 (0x00007fd9aa754000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007fd9aa408000)
libudev.so.1 => /lib/x86_64-linux-gnu/libudev.so.1 (0x00007fd9aa3e1000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd9aa29d000)
libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007fd9aa20d000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd9aa207000) good news is that skopeo is available from debian and conda-forge : https://anaconda.org/conda-forge/skopeo so at least in those packaging/deployment scenarios we could streamline (add as a dependency or Recommends: in debian). edit: just now mentioned -- no windows build on conda-forge, only osx and linux :-/ filed conda-forge/skopeo-feedstock#9 |
290743a
to
c45cef6
Compare
I've pushed an update to avoid duplicating logic across the docker and skopeo adapters. The first eight commits of the series do that, with the first commit being a temporary commit with @adswa's Windows-related docker adapter fixes from gh-137 and gh-140. range-diff
|
94099ab
to
34fec2e
Compare
These parts will be useful for the upcoming skopeo adapter as well.
"get" is probably clearer than "list", and tacking on "_ids" makes it clearer what the return value is. Also, drop the leading underscore, which is a holdover from the function being in the adapters.docker module.
The minimum Python version of DataLad is new enough that we can assume subprocess.run() is available. It's recommended by the docs, and I like it more, so switch to it. Note that we might want to eventually switch to using WitlessRunner here. The original idea with using the subprocess module directly was that it'd be nice for the docker adapter to be standalone, as nothing in the adapter depended on datalad at the time. That's not the case anymore after the adapters.utils split and the use of datalad.utils within it. (And the upcoming skopeo adapter will make heavier use of datalad for adding URLs to the layers.)
This logic will get a bit more involved in the next commit, and it will be needed by the skopeo adapter too.
When the adapter is called from the command line (as containers-run does) and datalad gets imported, the level set via the --verbose argument doesn't have an effect and logging happens twice, once through datalad's handler and once through the adapter's. Before 313c4f0 (WIN/Workaround: don't pass gid and uid to docker run call, 2020-11-10), the above was the case when docker.main() was triggered with the documented `python -m datalad_container.adapters ...` invocation, but not when the script path was passed to python. Following that commit, the adapter imports datalad, so datalad's logger is always configured. Adjust setup_logger() to set the log level of loggers under the datalad.containers.adapters namespace so that the adapter's logging level is in effect for command line calls to the adapter. As mentioned above, datalad is now loaded in all cases, so a handler is always configured, but, in case this changes in the future, add a simpler handler if one isn't already configured.
The same handling will be needed in the skopeo adapter. Avoid repeating it.
Some of the subprocess calls capture stderr. Show it to the caller on failure.
In order to be able to track Docker containers in a dataset, we introduced the docker-save-based docker adapter in 68a1462 (Add prototype of a Docker adapter, 2018-05-18). It's not clear how much this has been used, but at least conceptually it seems to be viable. One problem, however, is that ideally we'd be able to assign Docker registry URLs to the image files stored in the dataset (particularly the large non-configuration files). There doesn't seem to be a way to do this with the docker-save archives. Another option for storing the image in a dataset is the Open Container Initiative image format. Skopeo can be used to copy images in Docker registries (and some other destinations) to an OCI-compliant directory. When Docker Hub is used as the source, the resulting layers blobs can be re-obtained via GET /v2/NAME/blobs/ID. Using skopeo/OCI also has the advantage of making it easier to execute via podman in the future. Add an initial skopeo-based OCI adapter. At this point, it has the same functionality as the docker adapter.
After running `skopeo copy docker://docker.io/... oci:<dir>`, we can link up the layer to the Docker registry. However, other digests aren't preserved. One notable mismatch is between the image ID if you run docker pull x versus skopeo copy docker://x oci:x && skopeo copy oci:x docker-daemon:x I haven't really wrapped my head around all the different digests and when they can change. However, skopeo's issue tracker has a good deal of discussion about this, and it looks complicated (e.g., issues 11, 469, 949, 1046, and 1097). The adapter docstring should probably note this, though at this point I'm not sure I could say something coherent. Anyway, add a to-do note...
I _think_ containers-storage: is what we'd use for podman-run, but I haven't attempted it.
Prevent skopeo-copy output from being shown, since it's probably confusing to see output under run's "Command start (output follows)" tag for a command that the user didn't explicitly call. However, for large images, this has the downside that the user might want some signs of life, so this may need to be revisited.
We'll need this information in order to add a tag to the oci: destination and to make the entry copied to docker-daemon more informative. I've tried to base the rules on containers/image implementation, which is what skopeo uses underneath.
An image stored as an OCI directory can have a tag. If the source has a tag specified, copy it over to the destination. Note that in upcoming commits will store the full source specification as an image annotation, so we won't rely on this when copying the image to docker-daemon:, but it still seems nice to have (e.g., when looking at the directory with skopeo-inspect).
These will be used to store the value of the skopeo-copy source and then retrieve it at load time to make the docker-daemon: entry more informative.
The OCI format allows annotations. Add one with the source value (which will be determined by what the caller gives to containers-add) so that we can use this information when copying the information to a docker-daemon: destination.
The images copied to the daemon look like this $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE datalad-container/bb sha256-98345e4 98345e418eb7 3 weeks ago 69.2MB That tag isn't useful because it just repeats the image ID. And the name after "datalad-container/" is the name of the directory, so with the default containers-add location it would be an uninformative "image". With the last commit, we store the source specification as an annotation in the OCI directory. Parse it and reuse the original repository name and tag. REPOSITORY TAG IMAGE ID CREATED SIZE datalad-container/debian buster-slim 98345e418eb7 3 weeks ago 69.2MB If the source has a digest instead of the tag, construct the daemon tag from that.
Add a new oci: scheme. The stacking of the schemes isn't ideal (oci:docker://, oci:docker-daemon:), but it allows for any skopeo transport to be used. Note: I'm not avoiding appending "//" for a conceptual reason (although there might be a valid one), but because I find "oci://docker://" to be ugly. Perhaps the consistency with "shub://" and "dhub://" outweighs that though.
The next commit will use this logic in the oci adapter as well, and, it'd be nice (though not strictly necessary) to avoid oci and containers_add importing each other.
TODO: Finalize approach in Datalad for Docker Registry URLs.
I am closing this draft after some years. It is evident that nobody is working on it, and #106 exists to remind that this effort was made. |
This series adds an adapter that, like the existing docker-archive-based adapter, can be used to track a Docker image in a dataset. The disadvantage is that skopeo needs to be installed, while the main advantage is that the layers downloaded from docker:// sources can be linked to registry URLs.
This depends on support for registry URLs in datalad. Although the approach will likely change, there is a draft PR at datalad/datalad#5129.
Related:
#98
#135
Closes #106.
ensure_datalad_remote
variant from this PR with the one added to core in 434e3f9a8a (ENH: customremotes: Add ensure_datalad_remote(), 2021-05-07), which will be included in the release following 0.14.4.