Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep LATEST aws_ecs_task_definition container_definition image revision #20121

Open
dtiziani opened this issue Jul 9, 2021 · 29 comments
Open
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/ecs Issues and PRs that pertain to the ecs service.

Comments

@dtiziani
Copy link

dtiziani commented Jul 9, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

I'd like to keep a reference to the latest image in use for the task definition (revision) when changing other container definitions.
My deployed images on ECR uses a git commit SHA to tag them (like image-name:72937423940das44).
A single image is deployed on a service for staging, and after approval, to the production ECS service.

The issue I'm facing is that, when I make changes to the infrastructure, it loses the reference to the current deployed image revision, so I have to change the infrastructure, then re-run the latest deployment on CI to update to the latest image revision.

I did not find any ways to get the current image revision and keep it on "container_definitions -> image" field, and just apply the change on other fields.

If there was a datasource that could retrieve the latest revision from the current task definition I could manually check for it on the image field and use the default ECR url otherwise.

I've tried with aws_ecs_task_definition datasource, but it only outputs the revision, and the aws_ecs_container_definition requires the id of the task. Tried other workarounds to set the image to the current used one, but it ends in circular dependency.

image

New or Affected Resource(s)

  • aws_ecs_task_definition
  • aws_ecs_service

Potential Terraform Configuration

# Copy-paste your Terraform configurations here - for large Terraform configs,
# please use a service like Dropbox and share a link to the ZIP file. For
# security, you can also encrypt the files using our GPG public key.

resource "aws_ecs_task_definition" "ecs_task" {
  family = "${var.application_name}-${var.environment}"

  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"

  memory             = var.ecs_task_memory
  cpu                = var.ecs_task_cpu
  execution_role_arn = aws_iam_role.ecs_task_execution_role.arn

  task_role_arn = var.task_role_arn

  tags = local.tags

  depends_on = [
    aws_cloudwatch_log_group.log_group
  ]

  container_definitions = jsonencode(
    [
      {
        name      = "${var.application_name}-${var.environment}"
        image     = data.aws_ecr_repository.ecr_repo.repository_url
        essential = true,
        portMappings = [
          {
            containerPort = var.ecs_task_alb_port
            hostPort      = var.ecs_task_host_port
          }
        ]
        environment = var.environment_variables

        logConfiguration = {
          logDriver = "awslogs"
          options = {
            awslogs-group : aws_cloudwatch_log_group.log_group.name
            awslogs-region : var.region
            awslogs-stream-prefix : "${var.application_name_short}"
          }
        }
      }
  ])
}

References

  • #0000
@dtiziani dtiziani added the enhancement Requests to existing resources that expand the functionality or scope. label Jul 9, 2021
@github-actions github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/ecs Issues and PRs that pertain to the ecs service. labels Jul 9, 2021
@MarZab
Copy link

MarZab commented Jul 22, 2021

there's a old issue for this #632

the solution could be:

# get image name from the current (previous) definition
data "aws_ecs_task_definition" "previous" {
  count = var.first_run ? 0 : 1
  task_definition = "${var.application_name}-${var.environment}"
}
data "aws_ecs_container_definition" "previous" {
  count = var.first_run ? 0 : 1
  task_definition = data.aws_ecs_task_definition.previous[0].family
  container_name = "${var.application_name}-${var.environment}"
}

...snip..

  container_definitions = [
    {
      image = var.first_run ? var.task_definition_image : data.aws_ecs_container_definition.previous[0].image
      

using first_run to prevent issues when the previous image does not exist

@donaldpiret
Copy link

donaldpiret commented Jul 29, 2021

We're using an SSM parameter for this. The resource sets up an initial value (eg. latest) when created, but value changes to this parameter are then ignored.
Our CICD pipeline updates the SSM parameter with the deployed image as part of the deployment process.
Finally we have an additional data source on this same SSM parameter that is used during the generation of the container definitions to insert back the appropriate latest-deployed image tag.

@breathingdust breathingdust removed the needs-triage Waiting for first response or review from a maintainer. label Aug 31, 2021
@marcmarcet
Copy link

We're using an SSM parameter for this. The resource sets up an initial value (eg. latest) when created, but value changes to this parameter are then ignored. Our CICD pipeline updates the SSM parameter with the deployed image as part of the deployment process. Finally we have an additional data source on this same SSM parameter that is used during the generation of the container definitions to insert back the appropriate latest-deployed image tag.

This is a great solution. thank you.

@mightyguava
Copy link

I'm trying to set up an ECS deployment pipeline and running into similar issues. IIUC the workarounds so far, and from #632, with the exception of @WhyNotHugo's template idea, all would still create a diff in the task definition, because resource aws_ecs_task_definition tracks a specific revision. If an external tool (the CI/CD) pipeline creates an identical task definition with a new image, and terraform is able to fetch this new image using an external source (SSM, external tool, data source lookup), the image now differs on the task definition resource that is tracking an old revision.

The change from @GerardSoleCa in #30154 feels like a really great solution to this. It looks like you'd be able to do something like this:

locals {
  task_definition_family = var.service_name
  container_name         = var.service_name
  is_lookup              = var.image_tag == null
}

// If a var.image_tag is passed in, use it for first-run
// If a var.image_tag is not passed in, look up the deployed container definition
data "aws_ecs_service" "service" {
  count        = local.is_lookup ? 1 : 0
  cluster_arn  = var.cluster_arn
  service_name = var.service_name
}

data "aws_ecs_container_definition" "container" {
  count           = local.is_lookup ? 1 : 0
  task_definition = data.aws_ecs_service.service[0].task_definition
  container_name  = local.container_name
}

locals {
  deployed_image_tag = local.is_lookup ? try(split(":", data.aws_ecs_container_definition.container[0].image)[1], null) : null
  wanted_image_tag   = coalesce(var.image_tag, local.deployed_image_tag)
  max_task_def_revision = max(aws_ecs_task_definition.ecs_task[0].revision, data.aws_ecs_task_definition.ecs_task[0].revision)
}

// Get the existing task revision, need to depend on the task.
data "aws_ecs_task_definition" "ecs_task" {
  task_definition = aws_ecs_task_definition.ecs_task.arn_without_revision
  depends_on = [
    aws_ecs_task_definition.ecs_task
  ]
}

resource "aws_ecs_task_definition" "ecs_task" {
  family       = "${var.application_name}-${var.environment}"
  track_latest = true // new from the above PR

  // ... omitted for brevity ...

  container_definitions = jsonencode(
    [
      {
        name  = local.container_name
        image = "${var.image_repo}:${var.image_tag}"
      }
    ]
  )
}

resource "aws_ecs_service" "ecs_service" {
  name = var.service_name
  cluster = var.cluster_arn
  task_definition = "${local.task_definition_family}:${local.max_task_def_revision}"
}

This provides a path for creating the service on first run. Then on subsequent external CI changes to the image_tag, terraform would be able to pick up the image. Since the task definition is now tracking LATEST, instead of the original revision it created, terraform would not detect a difference. If changes were made to environment variables or other terraform code, terraform would be able to fetch the deployed image.

Without #30154, CI deploys would cause Terraform to detect a diff, and create a new task definition that matches the currently active one. It doesn't cause issues to the service but would trigger a deploy and produce a long diff.

@bryantbiggs
Copy link
Contributor

Use the aws_ecs_task_definition data source to re-construct the task definition ARN like this

But you'll need to synchronize the changes made by the two parties (Terraform and whatever else is making task def revisions) - usually this is the image tag which you can store in an SSM parameter. So if you deploy your service like this, Terraform creates a new task def at rev 0

Then, your CI process kicks off and creates a new image version, updates the SSM parameter then creates a new task def version and deploys the changes.

If you re-run Terraform, without task def changes - no changes are detected for the task def since its pulling the latest task def arn from the data source and parsing trick

If you re-run Terraform, with changes to the task definition - it will create a new task def revision but USING the image tag set in the SSM parameter

Its not ideal, but it works. One other fault - you'll need to supply a placeholder image on initial deploy since your pipeline may not have created an image yet (i.e. - no value in the SSM parameter on first deployment)

@mightyguava
Copy link

mightyguava commented May 30, 2023

Oh hey @bryantbiggs, I was actually just opening an issue in https://github.com/terraform-aws-modules/terraform-aws-ecs to ask you about this. I've read your design doc multiple times now 😅

I don't quite understand this part:

If you re-run Terraform, without task def changes - no changes are detected for the task def since its pulling the latest task def arn from the data source and parsing trick

and similarly from your doc

As an alternative, this module does provide a work around that would support an external party making changes to something like the image of the container definition. In a scenario where there the service, task definition, and container definition are all managed by Terraform, the following configuration could be used to allow an external party to change the image of the container definition without conflicting with Terraform, provided that the external party is also updating the image tag in a shared location that can be retrieved by Terraform (here, we are using SSM Parameter Store)

I'm parsing the image by fetching the actively deployed task definition from data "aws_ecs_container_definition" "container" instead of SSM, but it should be effectively the same.

IIUC, with the SSM approach, the SSM parameter is updated on deploy. Since your container definition is constructed using this SSM parameter, re-running terraform would construct a new container definition with the updated SSM parameter. This would cause resource "aws_ecs_task_definition" "this" https://github.com/terraform-aws-modules/terraform-aws-ecs/blob/dee59b733b805f9c16495bf65cc193260f537e47/modules/service/main.tf#LL608C1-L608C1 to have changes that need to be applied. It doesn't matter that the reconstructed task def matches the active one. Terraform still attempts to create a new task definition.

Or a hypothetical sequence of events:

  1. Initial bootstrap with a placeholder image, let's say image_tag is tag0, SSM parameter is set to tag0, and the task definition revision is 1
  2. CI system deploys tag1, sets the ssm parameter to tag1, task definition revision is 2
  3. Re-run Terraform. resource "aws_ecs_task_definition" "this" is still pointing to revision 1 with tag0, and terraform reconstructs the container definition using the image tag set in the SSM parameter. It will try to create a new task definition with image tag tag1 and rev 3.

Am I misunderstanding something here?

@GerardSoleCa
Copy link
Contributor

Hi all,

There is one important thing to state @bryantbiggs, the created resource always points to the fixed task definition revision that was created with that resource at the creation time. If you don't remove the task definitions in your process, you might be fine, but you will always compare against the same old revision.

In the case you remove old revisions (sometimes you don't want to keep old stuff there), the resource in terraform points to a non existing one, so it will always create a new one, because the task definition is no longer existing.

A second issue I might see not using the latest one, is that you check only against container changes for example, but the revision that the resource 'considers/handles' is much older than the task definition you have running in production.

And finally, with the module you are showing, don't you have the chicken-egg issue when starting from 0? I see some kind of circular dependency between the data and the resource, but maybe I'm missing something.

I'll try to put a couple of examples:

A. Deleting old Task Definitions

  1. We create the task definition from Terraform. State points to revision 1
  2. We update (create a new) task definition with an updated docker image (creates revision 1) and we delete (set to inactive) the revision 1.
  3. On Terraform plan/apply we will see that the state reference is no longer resolved and plans a new resource creation. Revision 3 would be created

B. Keeping Task Definitions

  1. We create the task definition from Terraform. State points to revision 1
  2. We update (create a new) task definition with an updated docker image (creates revision 2) and we keep revision 1
  3. On Terraform plan/apply no changes will be seen, but not because it checks the latest revision, changes are not seen because we are comparing against revision 1 that already exists.
  4. We update (create a new) task definition with an updated docker image (creates revision 3 from revision 2) and we keep revision 1 and 2
  5. On Terraform plan/apply we change one ENV variable, the new Task Definition will be based in revision 1, not revision 2 or 3

And the worst case scenario is that on step 5, we would be updating with an old docker image if we are not synchronising properly the docker image from step 4 and step 5.

So, in any case. The approach I'm trying to cover in this PullRequest would solve both issues. We could track always the latest active Task Definition.

Cheers!

@sausti
Copy link

sausti commented Aug 23, 2023

👋 @GerardSoleCa We're excited to see #30154 and appreciate you making the change! Is there anything preventing it from being merged?

@GerardSoleCa
Copy link
Contributor

👋 @GerardSoleCa We're excited to see #30154 and appreciate you making the change! Is there anything preventing it from being merged?

Still waiting that someone from Hashicorp jumps in and reviews the code. Maybe also helping me writing or guiding me through the unit testing of that part.

If you can upvote the PR I'd appreciate!!

@rsorelli-hedgepoint
Copy link

PR has just been approved ! lol

@ivan-sukhomlyn
Copy link

This functionality has been released in v5.37.0 of the Terraform AWS Provider.

@ivan-sukhomlyn
Copy link

ivan-sukhomlyn commented Feb 20, 2024

Hi @GerardSoleCa,
Have you tried using the 5.37 AWS provider version to track outside changes?
It looks like the behavior is not fully the same as expected to track the latest revision and doesn't trigger re-deploy by Terraform in case of updating the Docker image tag by application CI/CD, for example.

Could you please take a look at the example terraform-aws-modules/terraform-aws-ecs#171 (comment)?
Maybe you will help with advice on how to use the introduced changes in #30154 correctly.
Thanks in advance!

@pccowboy
Copy link

This introduced a bug in the state mv command. I was on provider 5.36, and attempted to execute a resource rename, here is what happened:

% terraform state mv module.atlas_cloud_staging.aws_lb.default 'module.atlas_cloud_staging.module.atlas_api_alb.aws_lb.this[0]'
Acquiring state lock. This may take a few moments...
Move "module.atlas_cloud_staging.aws_lb.default" to "module.atlas_cloud_staging.module.atlas_api_alb.aws_lb.this[0]"
Error saving the state: unsupported attribute "track_latest"

The state was not saved. No items were removed from the persisted
state. No backup was created since no modification occurred. Please
resolve the issue above and try again.

Seems like since I was renaming an ALB that it should not have stopped the move, especially since I was not on provider 5.37.

@amh-mw
Copy link

amh-mw commented Feb 21, 2024

Seems like since I was renaming an ALB that it should not have stopped the move, especially since I was not on provider 5.37.

Is it possible that someone else with access to your state is using 5.37 and updated it? Do you commit your .terraform.lock.hcl file?

% grep hashicorp/aws -A1 .terraform.lock.hcl
provider "registry.terraform.io/hashicorp/aws" {
  version     = "5.37.0"

% tf init
Initializing modules...
Initializing the backend...
Initializing provider plugins...
- terraform.io/builtin/terraform is built in to Terraform
- Reusing previous version of hashicorp/aws from the dependency lock file
- Using previously-installed hashicorp/aws v5.37.0

@pccowboy
Copy link

nope, I was the only one operating on that repo at that time.

@ivan-sukhomlyn
Copy link

ivan-sukhomlyn commented Feb 22, 2024

In my opinion, having a separate Terraform resource like aws_ecs_container_definition with a possibility to ignore changes only of the image is an option to track container definition, updated outside of Terraform by CI/CD and ignore changes at the Docker image tag, for example, because changes toleration in the aws_ecs_task_definition Terraform resource is not possible as container definition is as a separate JSON attribute.

An appropriate issue - #17988

Or use a hack with the aws_ecs_task_definition Terraform data source as
https://github.com/terraform-aws-modules/terraform-aws-ecs/blob/45f532c06488d84f140af36241d164facb5e05f5/modules/service/main.tf#L593-L609

@bolshakoff
Copy link

I think the recently merged #30154 does solve the problem. Here's how it works for me:

track_latest = true
image = data.aws_ecs_container_definition.this.image

Terraform is now smart enough to compare the resource to the latest task revision, even if it was deployed outside terraform, i.e. even if it is not tracked by terraform state yet.

So, I think this issue can be closed. 🤷‍♀️
@dtiziani - did you have a chance to try it out? 🙏

@bolshakoff
Copy link

Btw, for curiosity, another workaround I was employing for this was to run:

terraform apply -refresh-only

Terraform was smart enough to only update task definition revision in its state, and so subsequent terraform apply displayed zero diff. 👌

@marcaurele
Copy link

To make it work on our side we neede to change 2 things:

resource "aws_ecs_task_definition" "task_definition" {
  track_latest = true
}

and also to use the ecr_repository resource to select the latest tag pushed:

data "aws_ecr_repository" "ecr" {
  name        = "ecr-name"
  registry_id = "registry-id-if-needed"
}

locals {
  image_tag = coalesce(setsubtract(data.aws_ecr_repository.ecr.most_recent_image_tags, ["latest"])...)
}

and use the image_tag in the task definition content.. This way, Terraform can verify that the latest task definition has the same tag than the latest one pushed to the registry.

@GerardSoleCa
Copy link
Contributor

Finally I had some time to add this to my configs.
It's working, in some cases you might need still to synchronise the image tag, or whatever. There is people using a data resource (does not work for me to avoid chicken-egg-issue). I do use a script to sync the image between the local config and the remote deployed.

But I just need this, the rest is working properly in my case. No more recreations, we only see the updates in place. Also, diffs are somehow better.

@alexgoddity
Copy link

pls show working code example for aws_ecs_service and aws_ecs_task_definition[container_definitions] with track_latest = true.
Thanks

@marcaurele
Copy link

@alexgoddity here is an example based on the sample aws_ecs_task_definition from the provider docs:

locals {
  registry_id = "1234567890"
  # Most recent image is pushed with 2 tags: `latest` and the `git-sha1` value, and we want to use the `git-sha1` to be explicit
  image = "${data.aws_ecr.repository.repository_url}:${coalesce(setsubtract(data.aws_ecr_repository.opa_snapshot.most_recent_image_tags, ["latest"])...)}"
  # If there's only a single tag pushed, it can be simpler
  # image = "${data.aws_ecr.repository.repository_url}:${coalesce(data.aws_ecr_repository.opa_snapshot.most_recent_image_tags)}"
}

data "aws_ecr_repository" "ecr" {
  name        = "ecr-name"
  registry_id = local.registry_id
}

resource "aws_ecs_task_definition" "service" {
  family       = "service"
  track_latest = true

  container_definitions = jsonencode([
    {
      name      = "first"
      image     = local.image
      cpu       = 10
      memory    = 512
      essential = true
      portMappings = [
        {
          containerPort = 80
          hostPort      = 80
        }
      ]
    },
    {
      name      = "second"
      image     = "service-second"
      cpu       = 10
      memory    = 256
      essential = true
      portMappings = [
        {
          containerPort = 443
          hostPort      = 443
        }
      ]
    }
  ])

  volume {
    name      = "service-storage"
    host_path = "/ecs/service-storage"
  }

  placement_constraints {
    type       = "memberOf"
    expression = "attribute:ecs.availability-zone in [us-west-2a, us-west-2b]"
  }
}

@alexgoddity
Copy link

alexgoddity commented Mar 7, 2024

Thanks I am also looking for a solution to manage tasks from multiple sources.
With terraform and CI pipeline(aws cli) creating new tasks based on the latest, not replace existing one

@marcaurele
Copy link

marcaurele commented Mar 8, 2024

We are using https://github.com/silinternational/ecs-deploy in the CI which creates a new task on CI iterations based on the last one.

@fextr
Copy link

fextr commented Mar 19, 2024

I managed to solve the same problem with the help of track_latest = true and SSM parameter (hashicorp/aws v5.37.0+):

resource "aws_ssm_parameter" "image_tag" {
  name  = "image-tag-name"
  type  = "String"
  value = "latest"

  lifecycle {
    ignore_changes = [value]
  }
}

data "aws_ssm_parameter" "image_tag" {
  name = "image-tag-name"

  depends_on = [
    aws_ssm_parameter.image_tag
  ]
}

data "aws_ecr_repository" "ecr" {
  name        = "ecr-name"
}

resource "aws_ecs_task_definition" "app_task" {
  family                   = "task-family"
  track_latest             = true
  container_definitions    = <<DEFINITION
  [
    {
      "name": "app-name",
      "image": "${data.aws_ecr.repository.repository_url}:${data.aws_ssm_parameter.image_tag.value}",
    }
  ]
  DEFINITION
  
  ...
}

To deploy the application, I leverage GitHub Actions to update both the task definition (image) and SSM parameter (image_tag).

@scott-doyland-burrows
Copy link
Contributor

scott-doyland-burrows commented Apr 23, 2024

I don't understand how track_latest is helping.

I deployed via terraform, and then altered my image name outside of terraform. I assumed track_latest = true would hide the diff and terraform would not want to replace my image with the one that terraform has in it's code - but it did show a diff and did want to replace it.

So it made no difference, so am I misunderstanding how this is supposed to work?

@GerardSoleCa
Copy link
Contributor

I don't understand how track_latest is helping.

I deployed via terraform, and then altered my image name outside of terraform. I assumed track_latest = true would hide the diff and terraform would not want to replace my image with the one that terraform has in it's code - but it did show a diff and did want to replace it.

So it made no difference, so am I misunderstanding how this is supposed to work?

What track_latest is doing is on the plan fetch the latest task definition found in AWS, not using the version defined in the tfstate.

Then using the latest version in the AWS compares with what you have as code in your TF.

So it won't autoupdate the image for you, that is one thing you need to do. Some of us we fetch the image with a data or a script, others fetch the docker image tag using ssm parameters.

What we get with this track_latest, is that no new task definition is created everytime only when there are updates. But it's up to you to sync the docker image tag.

Cheers!

@scott-doyland-burrows
Copy link
Contributor

With track_latest = false I can create a new version of task definition outside of terraform, and there is no diff on the aws_ecs_task_definition resource.

However, I do get on diff on aws_ecs_service where I use:

task_definition = aws_ecs_task_definition.ecs_task_definition[each.key].arn_without_revision

but that doesn't actually matter as such. It is annoying but it doesn't alter anything in AWS when I apply.

If I instead specify task_definition = aws_ecs_task_definition.ecs_task_definition[each.key].arn_without_revision then it sets my service back to using the task that was last defined in terraform (ie not the task last defined outside of terraform).

@trixobird
Copy link

What I did was

data "aws_ecs_task_definition" "blockchain_listener_task_definition" {
  task_definition = aws_ecs_task_definition.blockchain_listener_task_definition.family

  depends_on = [
    # Needs to exist first on first deployment
    aws_ecs_task_definition.blockchain_listener_task_definition
  ]
}

resource "aws_ecs_task_definition" "blockchain_listener_task_definition" {
...
}

resource "aws_ecs_service" "blockchain_listener_ecs_service" {
...
  task_definition        = "${regex("^(.+):\\d+$", aws_ecs_task_definition.blockchain_listener_task_definition.arn)[0]}:${max(aws_ecs_task_definition.blockchain_listener_task_definition.revision, data.aws_ecs_task_definition.blockchain_listener_task_definition.revision)}"

}

There was no need for track_latest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/ecs Issues and PRs that pertain to the ecs service.
Projects
None yet
Development

No branches or pull requests