provider/docker: Data source for docker image #7000

kyhavlov · 2016-06-03T04:35:24Z

This PR is to add a data source for fetching the version of a docker image from a registry, so that terraform can correctly show the update of the image and its dependent containers in the same run, as discussed in #3639.

I've left the existing docker_image resource unchanged except for adding a "registry_id" field to it with ForceNew = true that can be passed in from this data source to determine whether the image needs to be re-pulled.

An example use of this data source:

data "docker_image" "nomad" {
  name = "kyhavlov/nomad:latest"
}

resource "docker_image" "nomad" {
  name = "${data.docker_image.nomad.name}"
  registry_id = "${data.docker_image.nomad.id}"
}

resource "docker_container" "nomad-server" {
  image        = "${docker_image.nomad.latest}"
  name         = "nomad-server"
}

I've still got to add an acceptance test for this (new to Go and haven't had experience with terraform acceptance tests so any input here is welcome) and update the docs, but the code itself is finished.

radeksimko · 2016-06-03T09:33:47Z

Great you came up with this PR, this was the next on my list of data sources I wanted to see in Terraform! 👍 😄

I feel there are two use-cases for this data source:

Define tag, typically latest or v1 (generally tags we expect to be repointed to different hashes) and translate that into hash
Just define repo URL (library/nginx) and translate that either into hash or tag which we don't expect to move.

The obvious question re 2) is how to determine which tag/hash is the latest - hopefully the API returns these in a particular order, if not we'd probably need to do some sorting.

Also having the computed tag can be useful when you want to use that as human-readable output - e.g.

data "docker_image" "nomad" {
  name = "kyhavlov/nomad"
}

resource "docker_image" "nomad" {
  name = "${data.docker_image.nomad.name}"
  image_id = "${data.docker_image.nomad.latest_tag}"
}
output "version_deployed" {
  value = "${data.docker_image.nomad.latest_tag}"
}

Outputs:

version_deployed: v0.4.5

also diff like this is much easier to read by humans:

~ docker_image.nomad
    image_id: "0.3.0" => "0.3.1"

rather than

~ docker_image.nomad
    image_id: "308e519aac60" => "493d82594c15"

kyhavlov · 2016-06-04T00:01:39Z

@radeksimko
For 2) the API doesn't return them in any order, so you have to loop through them and check the manifest for each one and compare dates.

With that said, is wanting to detect the newest tag a common thing? Docker already has the convention of latest being an alias to the most up to date tag, and I'd have thought that people would want to be explicit with the tags they give when deploying, rather than letting terraform implicitly find a tag for them. It's also counter to the docker behavior for tags, which is to default to latest when a tag isn't specified (instead of discovering the most recent).

kyhavlov · 2016-06-04T04:41:30Z

This should be ready for review now; I added a section in the docs and a couple acceptance tests for the new data source.

kyhavlov · 2016-06-29T21:32:16Z

Can anyone give feedback on this? @stack72

phinze · 2016-06-29T21:40:42Z

Hi @kyhavlov - thanks for your work on this so far!

I think @radeksimko is the best person to pull in for a review now that you've got a full working draft up here.

One concern I have is about the additional library dependency load this implementation uses. Importing ~70 new files seems hefty for what amounts to a single API call. Is there a bunch of work that the docker-registry-client library is doing, or can we inline an HTTP request + JSON parse to simplify this?

I'll let Radek comment more on the modeling questions, and we'll keep moving this forward! 🚀

kyhavlov · 2016-06-29T22:04:30Z

@phinze I'd like to do this without adding the dependency but I'm not sure how. The problem this is fixing stems from the fact that terraform needs to know whether it will have to update an image. Calling pull just to do read() doesn't work, because then it affects the resource, so I don't see any other way except checking the registry.

Getting the image manifest from the registry involves a couple API calls and dealing with oauth endpoints, even on the public docker hub, so it's doing a decent amount of work that we'd have to replicate even for just the basic case.

hmcgonig · 2016-07-01T18:43:00Z

This is awesome, can't wait for this. 👍

kyhavlov · 2016-07-10T00:18:49Z

@phinze @radeksimko any suggestions about the pull request/dependency?

radeksimko · 2016-07-10T16:12:24Z

Hi @kyhavlov
sorry for the delay. I took some time to read the current implementation of docker_image resource to understand the history & relationship to this new data source and also differences between Docker Remote API & Docker Registry API.

Data source VS resource

The docker_image resource reads the metadata about the image; mostly translates latest to hash by pulling the latest image/layer and then it also removes the old image/layer - all of that using docker provider client connection (Docker Remote API). Pulling the image can be useful as it is then cached as part of the terraform apply run even before the new image replaces the old one.

Your new data source uses the Docker Registry API (which is btw. what makes it very thin and quick 👍 ) to read the image metadata.

It IMO makes total sense to keep providing both functionalities as long as the separation is clear for the user and makes sense from maintenance perspective.

Provider separation

I feel like your data source and anything else that doesn't need Docker Remote API access deserves a separate provider - called e.g. docker_registry. Basically #6507 turned into a provider. This would make the data source useful for people who don't schedule containers via docker API directly (e.g. ECS, K8S) too.

`docker_image` resource

Instead of registry_id I suggest we follow the concept of triggers we have in null_resource, see https://www.terraform.io/docs/provisioners/null_resource.html and/or https://github.com/hashicorp/terraform/blob/master/builtin/providers/null/resource.go#L22-L26 and/or just expect the user to make name a reference - I think both use cases make sense - e.g. if you want to trigger pull of related containers when one is updated.

In addition to that we should make it very clear in the documentation that this resource doesn't pull new layers by default and encourage user to either interpolate data.docker_registry_image in the name field or use trigger_pull.

Example

Basically I'm thinking about it like this:

provider "docker_registry" {
  host = "https://registry.hub.docker.com"
  username = ""
  password = ""
}

data "docker_registry_image" "nomad" {
  name = "nomad"
  tag = "latest"
}

provider "docker" {
  host = "/my/lovely/docker/unix/socket"
}

# First use case
resource "docker_image" "nomad" {
  name = "${data.docker_registry_image.nomad.name}:${data.docker_registry_image.nomad.digest}"
}

# Second use case
resource "docker_image" "nomad" {
  name = "my_special_app:latest"
  trigger_pull = ["${data.docker_registry_image.nomad}"]
}

# Third use case
resource "aws_ecs_task_definition" "nomad" {
  family = "nomad"
  container_definitions = <<DEFINITION
[
  {
    "name": "jenkins",
    "image": "${data.docker_registry_image.nomad.name}:${data.docker_registry_image.nomad.digest}",
    "cpu": 10,
    "memory": 500,
  }
]
DEFINITION
}

I think we could remove the keep_updated field as it just causes the misbehaviour w/ double-plan/apply described in #3639 and has no further value.

@phinze I also feel the pain of dependency accumulation but I think it's ok to pull that library in given the complexity of docker registry auth - we need auth even for anonymous access.
I'm actually thinking that more of the logic could be ported into heroku/docker-registry-client in the future - e.g. construction of the registry URL and logic for maintaining schema v1/v2 compatibility.

radeksimko · 2016-07-10T20:11:32Z

As @xsellier rightly mentioned in #6507 (comment) there are some cases where multiple different registries can be used to deploy a set of containers onto the same docker host. That weakens my arguments for provider separation as you would need N * 2 provider blocks (N being number of unique registries) if you wanted to use that data source + docker_image resource to pull each image.

In that sense I don't mind keeping it under docker provider as long as we make host field Optional, but the rest of my arguments remain valid.

kyhavlov · 2016-07-10T20:52:38Z

Ok, so just to summarize, I'll replace registry_id and keep_updated with a trigger_pull list, change the docker provider to make the host field optional for people who just want the data source, and update the docs to explain how to keep the image updated with the data source rather than the keep_updated field.

radeksimko · 2016-07-10T21:21:28Z

builtin/providers/docker/data_source_docker_image.go

+				Optional: true,
+			},
+
+			"id": &schema.Schema{


I believe that id is in the schema by default settable via d.SetId() and readable via d.Id() - I'm actually surprised the internal schema validation isn't erroring out. It should probably be catching such cases.

In case you'll have any issues with using d.SetId() for an actual image hash I would suggest adding a new field called digest or sha256_digest as that's what it's called in the latest Docker API schema.

radeksimko · 2016-07-10T21:33:32Z

@kyhavlov Sounds 👌 Besides the comments about provider-centric auth - assuming we're keeping the data source inside docker provider, shall we name the new data source something like docker_registry_image to make it clear that no image is being pulled and it's only metadata from the Registry API?

Any suggestions on how to make the separation more obvious are welcomed.
Feel free to ping me if you want to discuss the solution in more detail or review updated PR.

kyhavlov · 2016-07-17T04:09:56Z

@radeksimko I'm mostly done making changes to address PR comments, just need to figure out how we want to add acceptance tests for the registry auth stuff.

radeksimko · 2016-07-17T16:28:44Z

@kyhavlov for the purpose of a code review it would be very helpful if you could either submit the dependency (0989ee66bab1e0309b34d1bb389f1e28da3fdd47) as a separate PR - I can merge that straight away - or squash the other three commits into a single one.

Acceptance tests discussed over IRC. 😃

kyhavlov · 2016-07-18T03:22:09Z

@radeksimko I squashed the non-vendor commits and added acceptance tests for pulling/reading an image from a private registry.

radeksimko · 2016-07-24T22:44:05Z

Thanks for squashing commits.

To share what we discussed w/ @kyhavlov over IRC in the spirit of transparency:

I started testing this PR with some private registry providers to see how well does it work with those.

Docker Hub (registry.hub.docker.com / index.docker.io / registry-1.docker.io)
AWS ECR (<ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com)
Google GCR (gcr.io/project-id/image)
Artifactory (custom-org.artifactoryonline.com/custom-repo / custom-repo.jfrog.io/custom-repo)
Quay.io (quay.io/repo/image)

and I achieved different levels of success per registry provider for different reasons. I reckon either Heroku don't stick to the Registry API strictly or those registry providers don't (maybe both or maybe the API definition isn't as strict as it should be).

I would be happy to go ahead with a limited support (e.g. only Docker Hub, which currently just works), but I'm afraid we will need to rethink the auth mechanisms completely. I would be especially worried about docker config parser & association of auth w/ normalised hostnames.

All of the above makes both of us think it would be best to focus on getting data source merged for public images only for now - i.e. take baby 👶 steps 👣 .

Thinking about support public registry only in this iteration we also discussed the possibility of removing the heroku library as dependency completely and relying on raw net/http & encoding/json for now.

@kyhavlov Feel free to add anything I may have missed from our conversation.

I will only review the part of PR that is not related to authentication.

kyhavlov · 2016-07-26T01:38:08Z

I removed the auth portions from the PR and replaced the heroku docker-registry-client lib with some code to get an image sha256 digest from a given registry endpoint (including the flow for oauth). So far I've tested it with a private registry running v.2.4.0, the Docker Hub, Google GCR, and Quay.io.

radeksimko · 2016-07-26T15:18:33Z

It only works for Docker Hub unless the URL explicitly specifies a port number, but I'm ok to call this 1st iteration as we discussed earlier.

The map below describes current behaviour of parseImageOptions:

cases := map[string]dc.PullImageOptions{
        "alpine":                                       dc.PullImageOptions{Repository: "alpine", Registry: "", Tag: ""},
        "alpine:latest":                                dc.PullImageOptions{Repository: "alpine", Registry: "", Tag: "latest"},
        "gcr.io/google_containers/busybox":             dc.PullImageOptions{Repository: "gcr.io/google_containers/busybox", Registry: "", Tag: ""},
        "gcr.io/google_containers/pause:0.8.0":         dc.PullImageOptions{Repository: "gcr.io/google_containers/pause", Registry: "", Tag: "0.8.0"},
        "gcr.io:443/google_containers/pause:0.8.0":     dc.PullImageOptions{Repository: "gcr.io:443/google_containers/pause", Registry: "gcr.io:443", Tag: "0.8.0"},
        "https://gcr.io/google_containers/busybox":     dc.PullImageOptions{Repository: "https://gcr.io/google_containers/busybox", Registry: "https:", Tag: "latest"},
        "library/alpine":                               dc.PullImageOptions{Repository: "library/alpine", Registry: "", Tag: ""},
        "library/alpine:latest":                        dc.PullImageOptions{Repository: "library/alpine", Registry: "", Tag: "latest"},
        "registry.hub.docker.io/library/alpine:latest": dc.PullImageOptions{Repository: "registry.hub.docker.io/library/alpine", Registry: "", Tag: "latest"},
    }

I will merge this + document the limitations.

Thank you for your patience and persistence! 😄

ghost · 2020-04-24T01:55:44Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

radeksimko added enhancement provider/docker new-data-source labels Jun 3, 2016

kyhavlov force-pushed the master branch 9 times, most recently from 7957b3d to 64e7378 Compare June 3, 2016 20:15

kyhavlov force-pushed the master branch 2 times, most recently from bdfdcb2 to b5f3529 Compare June 4, 2016 04:30

kyhavlov changed the title ~~[WIP] provider/docker: Data source for docker image~~ provider/docker: Data source for docker image Jun 4, 2016

kyhavlov mentioned this pull request Jun 4, 2016

Docker images version change do not trigger container update #3639

Closed

apparentlymart mentioned this pull request Jun 6, 2016

Convert Legacy Logical and Read-only Resources to Data Sources #6688

Closed

7 tasks

radeksimko self-assigned this Jun 29, 2016

radeksimko mentioned this pull request Jul 10, 2016

providers/docker Add docker_registry #6507

Closed

radeksimko added waiting-response An issue/pull request is waiting for a response from the community new-provider labels Jul 10, 2016

radeksimko removed the provider/docker label Jul 10, 2016

radeksimko reviewed Jul 10, 2016
View reviewed changes

kyhavlov force-pushed the master branch 3 times, most recently from 3cdd480 to 7619a53 Compare July 17, 2016 03:53

radeksimko added provider/docker and removed new-provider labels Jul 17, 2016

kyhavlov force-pushed the master branch 2 times, most recently from 8b4ef42 to 194f7ee Compare July 18, 2016 03:03

kyhavlov force-pushed the master branch from 194f7ee to c1c8fd2 Compare July 26, 2016 01:29

provider/docker: Added docker_registry_image data source

c1c8fd2

radeksimko removed the waiting-response An issue/pull request is waiting for a response from the community label Jul 26, 2016

radeksimko merged commit 09bba04 into hashicorp:master Jul 26, 2016

radeksimko mentioned this pull request Jul 26, 2016

docker/docs: Document new data source + limitations #7814

Merged

This was referenced Jul 30, 2016

Terraform state stuck after invalid docker_image tag #6431

Closed

provider/docker: Add support for registry auth #7949

Closed

ghost locked and limited conversation to collaborators Apr 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

provider/docker: Data source for docker image #7000

provider/docker: Data source for docker image #7000

kyhavlov commented Jun 3, 2016 •

edited

Loading

radeksimko commented Jun 3, 2016

kyhavlov commented Jun 4, 2016 •

edited

Loading

kyhavlov commented Jun 4, 2016

kyhavlov commented Jun 29, 2016

phinze commented Jun 29, 2016

kyhavlov commented Jun 29, 2016 •

edited

Loading

hmcgonig commented Jul 1, 2016

kyhavlov commented Jul 10, 2016

radeksimko commented Jul 10, 2016 •

edited

Loading

radeksimko commented Jul 10, 2016

kyhavlov commented Jul 10, 2016 •

edited

Loading

radeksimko Jul 10, 2016

radeksimko Jul 10, 2016

radeksimko commented Jul 10, 2016 •

edited

Loading

kyhavlov commented Jul 17, 2016

radeksimko commented Jul 17, 2016

kyhavlov commented Jul 18, 2016

radeksimko commented Jul 24, 2016 •

edited

Loading

kyhavlov commented Jul 26, 2016

radeksimko commented Jul 26, 2016

ghost commented Apr 24, 2020

provider/docker: Data source for docker image #7000

provider/docker: Data source for docker image #7000

Conversation

kyhavlov commented Jun 3, 2016 • edited Loading

radeksimko commented Jun 3, 2016

kyhavlov commented Jun 4, 2016 • edited Loading

kyhavlov commented Jun 4, 2016

kyhavlov commented Jun 29, 2016

phinze commented Jun 29, 2016

kyhavlov commented Jun 29, 2016 • edited Loading

hmcgonig commented Jul 1, 2016

kyhavlov commented Jul 10, 2016

radeksimko commented Jul 10, 2016 • edited Loading

Data source VS resource

Provider separation

docker_image resource

Example

radeksimko commented Jul 10, 2016

kyhavlov commented Jul 10, 2016 • edited Loading

radeksimko Jul 10, 2016

Choose a reason for hiding this comment

radeksimko Jul 10, 2016

Choose a reason for hiding this comment

radeksimko commented Jul 10, 2016 • edited Loading

kyhavlov commented Jul 17, 2016

radeksimko commented Jul 17, 2016

kyhavlov commented Jul 18, 2016

radeksimko commented Jul 24, 2016 • edited Loading

kyhavlov commented Jul 26, 2016

radeksimko commented Jul 26, 2016

ghost commented Apr 24, 2020

kyhavlov commented Jun 3, 2016 •

edited

Loading

kyhavlov commented Jun 4, 2016 •

edited

Loading

kyhavlov commented Jun 29, 2016 •

edited

Loading

radeksimko commented Jul 10, 2016 •

edited

Loading

`docker_image` resource

kyhavlov commented Jul 10, 2016 •

edited

Loading

radeksimko commented Jul 10, 2016 •

edited

Loading

radeksimko commented Jul 24, 2016 •

edited

Loading