Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale/Old Plans Issue #1122

Closed
srlightbody opened this issue Jul 21, 2020 · 9 comments
Closed

Stale/Old Plans Issue #1122

srlightbody opened this issue Jul 21, 2020 · 9 comments
Labels
bug Something isn't working waiting-on-review Waiting for a review from a maintainer

Comments

@srlightbody
Copy link

Running into an interesting case based on our current configuration and would like some input on how to resolve it. Atlantis version is 0.14.0 . We have repos set up with the following files in the root of the repo -
variables_daily.tf
variables_staging.tf
variables_production.tf

These correspond to distinct Terraform workspaces, and is how we delineate the environments. For those repositories our atlantis.yaml looks like this -

version: 3
projects:
- dir: ./
  autoplan:
    when_modified: ["./variables_daily.tf","/modules/*/*.tf"]
  workspace: daily
- dir: ./
  autoplan:
    when_modified: ["./variables_staging.tf"]
  workspace: staging
- dir: ./
  autoplan:
    when_modified: ["./variables_production.tf"]
  workspace: production

The issue we're seeing is this:

  1. A developer creates a PR that has changes to variables_production.tf and variables_staging.tf
  2. Atlantis then plans both staging and production as expected
  3. The developer realizes they did not mean to include variables_production.tf and adds a commit to the PR that removes that change, so the PR now only includes a change to variables_staging.tf
  4. Atlantis re-plans due to the commit, but since variables_production.tf is not in the PR, it only re-plans staging.
  5. The developer then runs atlantis apply thinking they have safely removed production from the PR
  6. The initial plan that was created for production is still valid, and is applied, even though the PR no longer contains a production change

One way to mitigate this might be to flip the 'disable appply all' option to true, which would force them to be more explicit about it. There is some resistance to this, as developers like being able to bulk apply to daily and staging together.

The other good option seems to be forcing atlantis to destroy existing plans on any new commit to the PR. At the moment I do not see a direct way to do that, I'm thinking of using a custom workflow to always run atlantis unlock before any plan. Would that do the trick? And, if not, would a "destroy-on-plan" flag that set atlantis's behavior to trash all existing plans on a re-plan be a useful feature?

@grimm26
Copy link
Contributor

grimm26 commented Jul 24, 2020

that setup sounds super confusing, but I've hit the same issue in a different manner so I'm curious as to a solution.

@srlightbody
Copy link
Author

We're using Atlantis/Terraform to control the entire stack for a bunch of services, including the k8s deployments, which is kind of how we ended up here. So far it has worked really well other than running into some weird cases like this one. It lets us be pretty specific about required approvals for different environments using codeowners, and also lets developers do basic infrastructure updates for their apps without having to get too in depth into the terraform code (mostly k8s image tag bumps).

It doesn't look like this is getting much attention, so I'm guessing it's not an issue a ton of people are running into. It's definitely a pretty specific use case I'd say.

@ghostsquad
Copy link

I just stumbled onto this issue, and I'm also concerned. This is definitely a bug that could cause major issues.

@angeloskaltsikis
Copy link

angeloskaltsikis commented Nov 26, 2020

This happened to us as well in several occasions.
It can be VERY dangerous as it seems that a mistake which has been deleted/fix during the development flow of a PR will run and can cause unexpected problems.

I will try to investigate exactly how Atlantis works under the hood and try to find a proper solution.
One quick solution that comes to mind is to delete all previously existing plan files after each commit.

@lkysow are you aware of this bug?


UPDATE:
According to this line it uses this funtion which does exactly what the comment below states

		// Any generated plans should be untracked by git since Atlantis created
		// them.

In order to fix this bug we need to run .tfplans files that exist inside the folders that have been modified similar to what we are using to find the modifiedFiles (like here)

@Symbianx
Copy link

Symbianx commented Feb 4, 2021

@lkysow Is this something that can be fixed?
We would like to see this fixed and can help with a PR if you provide some guidance.

@angeloskaltsikis
Copy link

We would like also this to be solved and we are willing to help with the PR as well.
@Symbianx what do you think to design the solution together?

@grimm26
Copy link
Contributor

grimm26 commented Feb 4, 2021

I like the suggestion of the OP of "The other good option seems to be forcing atlantis to destroy existing plans on any new commit to the PR." Adding at least an option for an implied unlock when a new commit is pushed would solve this.

angeloskaltsikis added a commit to angeloskaltsikis/atlantis that referenced this issue Feb 12, 2021
There is an edge case in which if the plan files for modules (1),(2) have
been already created for a commit (a) in a PR and then the user understands
that he made a mistake and he didn't want to change module (2) and drops
the terraform code of it in commit (b), then when the user will try to
run the apply command Atlantis will apply the (2) module plan file as well
as the `PendingPlanFinder.Find` finds all the differences between the upstream
branch and what Atlantis contains and filters out the `.tfplan` files.
There is an issue connected to this bug runatlantis#1122

In order to mitigate the above bug after some careful consideration it seems
that the best solution is to drop all the plan files in case the PR has been
locked anytime in the past, which translates that it most likely have created
`.tfplan` files.

This commit is not expected to create any side-effects or introduce any
new problems.
angeloskaltsikis added a commit to angeloskaltsikis/atlantis that referenced this issue Feb 12, 2021
There is an edge case in which if the plan files for modules (1),(2) have
been already created for a commit (a) in a PR and then the user understands
that he made a mistake and he didn't want to change module (2) and drops
the terraform code of it in commit (b), then when the user will try to
run the apply command Atlantis will apply the (2) module plan file as well
as the `PendingPlanFinder.Find` finds all the differences between the upstream
branch and what Atlantis contains and filters out the `.tfplan` files.
There is an issue connected to this bug runatlantis#1122

In order to mitigate the above bug after some careful consideration it seems
that the best solution is to drop all the plan files in case the PR has been
locked anytime in the past, which translates that it most likely have created
`.tfplan` files.

This commit is not expected to create any side-effects or introduce any
new problems.
@angeloskaltsikis
Copy link

Hey guys,
Just for awareness that I have created a PR that should fix this bug.
In case anyone wants to take a look and make any suggestion you are more than welcome.

@jamengual jamengual added bug Something isn't working waiting-on-review Waiting for a review from a maintainer labels Aug 26, 2022
@jamengual
Copy link
Contributor

Closed due to inactivity, if this still needed comment and we will reopen but check the latest docs/features first.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working waiting-on-review Waiting for a review from a maintainer
Projects
None yet
Development

No branches or pull requests

6 participants