-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provider/docker: Data source for docker image #7000
Conversation
Great you came up with this PR, this was the next on my list of data sources I wanted to see in Terraform! 👍 😄 I feel there are two use-cases for this data source:
The obvious question re 2) is how to determine which tag/hash is the latest - hopefully the API returns these in a particular order, if not we'd probably need to do some sorting. Also having the computed tag can be useful when you want to use that as human-readable output - e.g. data "docker_image" "nomad" {
name = "kyhavlov/nomad"
}
resource "docker_image" "nomad" {
name = "${data.docker_image.nomad.name}"
image_id = "${data.docker_image.nomad.latest_tag}"
}
output "version_deployed" {
value = "${data.docker_image.nomad.latest_tag}"
}
also diff like this is much easier to read by humans:
rather than
|
7957b3d
to
64e7378
Compare
@radeksimko With that said, is wanting to detect the newest tag a common thing? Docker already has the convention of latest being an alias to the most up to date tag, and I'd have thought that people would want to be explicit with the tags they give when deploying, rather than letting terraform implicitly find a tag for them. It's also counter to the docker behavior for tags, which is to default to latest when a tag isn't specified (instead of discovering the most recent). |
bdfdcb2
to
b5f3529
Compare
This should be ready for review now; I added a section in the docs and a couple acceptance tests for the new data source. |
Can anyone give feedback on this? @stack72 |
Hi @kyhavlov - thanks for your work on this so far! I think @radeksimko is the best person to pull in for a review now that you've got a full working draft up here. One concern I have is about the additional library dependency load this implementation uses. Importing ~70 new files seems hefty for what amounts to a single API call. Is there a bunch of work that the docker-registry-client library is doing, or can we inline an HTTP request + JSON parse to simplify this? I'll let Radek comment more on the modeling questions, and we'll keep moving this forward! 🚀 |
@phinze I'd like to do this without adding the dependency but I'm not sure how. The problem this is fixing stems from the fact that terraform needs to know whether it will have to update an image. Calling pull just to do read() doesn't work, because then it affects the resource, so I don't see any other way except checking the registry. Getting the image manifest from the registry involves a couple API calls and dealing with oauth endpoints, even on the public docker hub, so it's doing a decent amount of work that we'd have to replicate even for just the basic case. |
This is awesome, can't wait for this. 👍 |
@phinze @radeksimko any suggestions about the pull request/dependency? |
Hi @kyhavlov Data source VS resourceThe Your new data source uses the Docker Registry API (which is btw. what makes it very thin and quick 👍 ) to read the image metadata. It IMO makes total sense to keep providing both functionalities as long as the separation is clear for the user and makes sense from maintenance perspective. Provider separationI feel like your data source and anything else that doesn't need Docker Remote API access deserves a separate provider - called e.g.
|
As @xsellier rightly mentioned in #6507 (comment) there are some cases where multiple different registries can be used to deploy a set of containers onto the same docker host. That weakens my arguments for In that sense I don't mind keeping it under |
Ok, so just to summarize, I'll replace |
Optional: true, | ||
}, | ||
|
||
"id": &schema.Schema{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that id
is in the schema by default settable via d.SetId()
and readable via d.Id()
- I'm actually surprised the internal schema validation isn't erroring out. It should probably be catching such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you'll have any issues with using d.SetId()
for an actual image hash I would suggest adding a new field called digest
or sha256_digest
as that's what it's called in the latest Docker API schema.
@kyhavlov Sounds 👌 Besides the comments about provider-centric auth - assuming we're keeping the data source inside Any suggestions on how to make the separation more obvious are welcomed. |
3cdd480
to
7619a53
Compare
@radeksimko I'm mostly done making changes to address PR comments, just need to figure out how we want to add acceptance tests for the registry auth stuff. |
@kyhavlov for the purpose of a code review it would be very helpful if you could either submit the dependency ( Acceptance tests discussed over IRC. 😃 |
8b4ef42
to
194f7ee
Compare
@radeksimko I squashed the non-vendor commits and added acceptance tests for pulling/reading an image from a private registry. |
Thanks for squashing commits. To share what we discussed w/ @kyhavlov over IRC in the spirit of transparency: I started testing this PR with some private registry providers to see how well does it work with those.
and I achieved different levels of success per registry provider for different reasons. I reckon either Heroku don't stick to the Registry API strictly or those registry providers don't (maybe both or maybe the API definition isn't as strict as it should be). I would be happy to go ahead with a limited support (e.g. only Docker Hub, which currently just works), but I'm afraid we will need to rethink the auth mechanisms completely. I would be especially worried about docker config parser & association of auth w/ normalised hostnames. All of the above makes both of us think it would be best to focus on getting data source merged for public images only for now - i.e. take baby 👶 steps 👣 . Thinking about support public registry only in this iteration we also discussed the possibility of removing the @kyhavlov Feel free to add anything I may have missed from our conversation. I will only review the part of PR that is not related to authentication. |
I removed the auth portions from the PR and replaced the heroku docker-registry-client lib with some code to get an image sha256 digest from a given registry endpoint (including the flow for oauth). So far I've tested it with a private registry running v.2.4.0, the Docker Hub, Google GCR, and Quay.io. |
It only works for Docker Hub unless the URL explicitly specifies a port number, but I'm ok to call this 1st iteration as we discussed earlier. The map below describes current behaviour of cases := map[string]dc.PullImageOptions{
"alpine": dc.PullImageOptions{Repository: "alpine", Registry: "", Tag: ""},
"alpine:latest": dc.PullImageOptions{Repository: "alpine", Registry: "", Tag: "latest"},
"gcr.io/google_containers/busybox": dc.PullImageOptions{Repository: "gcr.io/google_containers/busybox", Registry: "", Tag: ""},
"gcr.io/google_containers/pause:0.8.0": dc.PullImageOptions{Repository: "gcr.io/google_containers/pause", Registry: "", Tag: "0.8.0"},
"gcr.io:443/google_containers/pause:0.8.0": dc.PullImageOptions{Repository: "gcr.io:443/google_containers/pause", Registry: "gcr.io:443", Tag: "0.8.0"},
"https://gcr.io/google_containers/busybox": dc.PullImageOptions{Repository: "https://gcr.io/google_containers/busybox", Registry: "https:", Tag: "latest"},
"library/alpine": dc.PullImageOptions{Repository: "library/alpine", Registry: "", Tag: ""},
"library/alpine:latest": dc.PullImageOptions{Repository: "library/alpine", Registry: "", Tag: "latest"},
"registry.hub.docker.io/library/alpine:latest": dc.PullImageOptions{Repository: "registry.hub.docker.io/library/alpine", Registry: "", Tag: "latest"},
} I will merge this + document the limitations. Thank you for your patience and persistence! 😄 |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
This PR is to add a data source for fetching the version of a docker image from a registry, so that terraform can correctly show the update of the image and its dependent containers in the same run, as discussed in #3639.
I've left the existing docker_image resource unchanged except for adding a "registry_id" field to it with
ForceNew = true
that can be passed in from this data source to determine whether the image needs to be re-pulled.An example use of this data source:
I've still got to add an acceptance test for this (new to Go and haven't had experience with terraform acceptance tests so any input here is welcome) and update the docs, but the code itself is finished.