Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow image_identifier to be lifecycle ignored #13

Closed
booi opened this issue May 11, 2024 · 13 comments
Closed

Allow image_identifier to be lifecycle ignored #13

booi opened this issue May 11, 2024 · 13 comments

Comments

@booi
Copy link

booi commented May 11, 2024

Is your request related to a new offering from AWS?

No

Is your request related to a problem? Please describe.

Yes. When using apprunner in image mode you supply the image_identifier which includes the tag. Typically docs will just write someorg/someproject:latest.

However, in production you usually deploy a specific tag (in our case a semver or git sha). someorg/someproject:v1.5. This is to prevent accidentally deploying the wrong version due to a race condition in whoever sets the latest tag last.

But if you deploy a specific tag and then rerun terraform, terraform picks up that the image_identifier has changed and wants to reset it back to latest.

Describe the solution you'd like.

An option that would effectively add a lifecycle ignore_changes to the image_identifier field. That way terraform can do the initial provision but subsequent deployments can change the image_identifier without bringing terraform out of sync.

Describe alternatives you've considered.

We switched our deploys to tag latest right before deployment. This doesn't eliminate the issue but reduces the window that it could be an issue.

Additional context

@bryantbiggs
Copy link
Member

this sounds like a bug for the upstream AWS provider

@booi
Copy link
Author

booi commented May 12, 2024

this sounds like a bug for the upstream AWS provider

Hi @bryantbiggs, could you explain how it might be an upstream AWS provider issue? The solution I mentioned does work in the AWS provider, it just needs to be configured in this module.

@bryantbiggs
Copy link
Member

because of this statement (if its true):

But if you deploy a specific tag and then rerun terraform, terraform picks up that the image_identifier has changed and wants to reset it back to latest

Using the ignore changes lifecycle is a very heavy handed solution and I suspect nearly everyone using this module would not want that behavior

@booi
Copy link
Author

booi commented May 14, 2024

because of this statement (if its true):

But if you deploy a specific tag and then rerun terraform, terraform picks up that the image_identifier has changed and wants to reset it back to latest

Using the ignore changes lifecycle is a very heavy handed solution and I suspect nearly everyone using this module would not want that behavior

Most people do not use terraform for deployment so I'm not sure how that can be true. Is there an alternative solution?

@bryantbiggs
Copy link
Member

How do you know that?

@booi
Copy link
Author

booi commented May 14, 2024

Well professionally I've worked for a number of companies from small to large (3000+) that all use terraform to provision and configure cloud resources but generally hand off deployment to a CI/CD like github actions, jenkins etc.

Terraform is inherently more risky than a typical deployment as it affects cloud infrastructure and certainly isn't the recommended way to do deployments with ECS or k8s. It would also require you to commit the new image tag to git from a terraform action which isn't that common and doesn't play well in larger multiuser environment.

Have you seen different?

@bryantbiggs
Copy link
Member

Disclaimer: I work at AWS as a container specialist with our container service teams (EKS, ECS, Lambda, etc.)

Have you seen different?

I have seen quite a bit 😬

Terraform is inherently more risky than a typical deployment as it affects cloud infrastructure and certainly isn't the recommended way to do deployments with ECS or k8s. It would also require you to commit the new image tag to git from a terraform action which isn't that common and doesn't play well in larger multiuser environment.

These are all very subjective statements. However, there are various parts where static configurations meet dynamically changing values (artifact versions and autoscaling values are the two most common culprits). You can see some of these pains here and here

In short, to support these we'd have to do a hacky thing by supporting dual resources - one with a lifecycle ignore block, and another without. Its a pattern that I abhor and am reluctant to support if at all possible. Instead, I would encourage users to explore alternative avenues, such as using an ECR data source and pulling the image value from there as the input to your AppRunner module configuration, or sometimes SSM parameter store works just as well

@booi
Copy link
Author

booi commented May 14, 2024

I would encourage users to explore alternative avenues, such as using an ECR data source and pulling the image value from there as the input to your AppRunner module configuration

OHH I’m not sure why I didn’t think of that. That’s definitely a better approach. Let me see if that’s doable and if there’s a “first-run” issue. Thanks!

@booi
Copy link
Author

booi commented May 30, 2024

@bryantbiggs I did a bunch of digging into this as it did come up again and I'm kind of at a loss. I tried to use an aws_ecr_image data query to insert the correct image into the image configuration which seems to work 'ok'-ish with 2 fairly major problems.

  1. On first run, the ECR repo is empty (terraform just created it). So the data query actually fails the entire terraform plan. It requires a lot of manual work to get it past this point.
  2. It's not easy to select the correct image. The 2 best candidates are (1) the latest tag or (2) the most recent image. Both have their shortcomings. (1) AppRunner only actually deploys if the deployed image tag is different than the one currently running. So the actual apprunner service will not be using the latest tag. Even if we also tag the current image with latest, that means terraform will force apprunner to redeploy to the latest tag even if it's the same causing terraform to run for at least 2-5 minutes waiting for the service to redeploy. (2) the most recent image is not necessarily the "current" image. CI/CD will build and push a container even if it doesn't intend on running it (e.g. a dev or test container).

I'm not sure how this module fits into a production environment with these shortcomings. Let me know if you think there's a different workaround. I do feel like an optional lifecycle policy on the image configuration would solve all these problems.

@turingbeing
Copy link

turingbeing commented Jun 7, 2024

(2) the most recent image is not necessarily the "current" image. CI/CD will build and push a container even if it doesn't intend on running it (e.g. a dev or test container).

We segregate our ECR infrastructure to facilitate this, Dev / QA / UAT images are pushed to a different ECR, and only Approved Releases that are merged to main are built and pushed to the "Live / Production" ECR.

We also employ a fix-forward strategy, so generally don't rollback to previous container versions, most of the time those are caught by staging before making it to live.

We don't use the latest tag, but instead the short hash from the repo, which also corresponds to "app version". In our case most_recent works nicely.

Deployments are handled by Buildkite.

Copy link

github-actions bot commented Jul 8, 2024

This issue has been automatically marked as stale because it has been open 30 days
with no activity. Remove stale label or comment or this issue will be closed in 10 days

@github-actions github-actions bot added the stale label Jul 8, 2024
Copy link

This issue was automatically closed because of stale in 10 days

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 19, 2024
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants