Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an automerge feature. #389

Closed
wants to merge 1 commit into from

Conversation

brndnmtthws
Copy link

Similar to autoplan, automerge is a feature whereby Atlantis will merge
a PR/MR once all plans have successfully been applied.

This addresses issue #186.

@brndnmtthws
Copy link
Author

I've tested this with GitHub, but not GitLab or Bitbucket.

@codecov
Copy link

codecov bot commented Dec 13, 2018

Codecov Report

Merging #389 into master will decrease coverage by 0.71%.
The diff coverage is 32.43%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #389      +/-   ##
==========================================
- Coverage   69.77%   69.05%   -0.72%     
==========================================
  Files          62       62              
  Lines        3801     3875      +74     
==========================================
+ Hits         2652     2676      +24     
- Misses        960     1004      +44     
- Partials      189      195       +6
Impacted Files Coverage Δ
server/events/project_command_builder.go 82.9% <ø> (ø) ⬆️
server/events/yaml/valid/valid.go 92% <ø> (ø) ⬆️
cmd/server.go 79.16% <ø> (ø) ⬆️
server/events/vcs/proxy.go 28.57% <0%> (-2.2%) ⬇️
server/events/vcs/bitbucketcloud/client.go 39.02% <0%> (-2.01%) ⬇️
server/events/vcs/github_client.go 60.22% <0%> (-7.73%) ⬇️
server/events/vcs/gitlab_client.go 18.58% <0%> (-1.23%) ⬇️
server/events/vcs/bitbucketserver/client.go 30.18% <0%> (-1.82%) ⬇️
server/events/vcs/not_configured_vcs_client.go 0% <0%> (ø) ⬆️
server/server.go 66.4% <100%> (+0.13%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 858dbe4...fc45f10. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 13, 2018

Codecov Report

Merging #389 into master will decrease coverage by 0.85%.
The diff coverage is 52.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #389      +/-   ##
==========================================
- Coverage   70.78%   69.92%   -0.86%     
==========================================
  Files          63       63              
  Lines        3987     4067      +80     
==========================================
+ Hits         2822     2844      +22     
- Misses        970     1016      +46     
- Partials      195      207      +12
Impacted Files Coverage Δ
server/events/yaml/valid/valid.go 93.54% <ø> (ø) ⬆️
server/events/project_command_builder.go 84.47% <ø> (ø) ⬆️
cmd/server.go 78.23% <ø> (-0.72%) ⬇️
server/user_config.go 100% <ø> (ø) ⬆️
server/events/vcs/bitbucketcloud/client.go 39.02% <0%> (-2.01%) ⬇️
server/events/vcs/not_configured_vcs_client.go 0% <0%> (ø) ⬆️
server/events/vcs/gitlab_client.go 18.26% <0%> (-1.19%) ⬇️
server/events/vcs/bitbucketserver/client.go 29.81% <0%> (-1.77%) ⬇️
server/events/vcs/proxy.go 28.57% <0%> (-2.2%) ⬇️
server/events/yaml/raw/config.go 100% <100%> (ø) ⬆️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56b09b6...88909dd. Read the comment docs.

@brndnmtthws brndnmtthws force-pushed the automerge branch 2 times, most recently from 96fa097 to 317cf83 Compare December 13, 2018 02:33
@brndnmtthws
Copy link
Author

The most recent CI failure seems unrelated to any code changes. I think it just needs to be re-run.

@ajbouh
Copy link

ajbouh commented Dec 20, 2018

@brndnmtthws Excited about this PR, looks awesome!

@brndnmtthws
Copy link
Author

Just rebased on master to resolve the conflicts.

@brndnmtthws
Copy link
Author

Looks like some tests broke because of changes elsewhere, so I've updated them.

@lkysow
Copy link
Member

lkysow commented Jan 4, 2019

Hey Brenden, I'm just getting back into things after the holidays. I'll review this asap.

@brndnmtthws
Copy link
Author

Sweet, thanks for the update! Hope you had a nice break.

@@ -0,0 +1,17 @@
# Automerging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the service account used to post comments etc. require write permissions for this to work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The account would indeed require write permissions. Is there something in the doc which contradicts that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing that contradicts it, but it doesn't mention that requirement explicitly anywhere :)

Copy link
Author

@brndnmtthws brndnmtthws Jan 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required for Atlantis to work correctly (I think?), so I don't see why it would be explicitly called out in this section of the doc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool. Nvm then 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually does require increased permissions depending on your setup. For example, my user token can comment on public pull requests but it can't merge unless I'm a collaborator.

The token needs to be from a user with those permissions.

@lkysow
Copy link
Member

lkysow commented Jan 8, 2019

@brndnmtthws don't worry about rebasing or any of the conflicts. I'm pulling this into my own branch to make some finishing touches.

Can you tell me which Git hosts you've tested this with?

@lkysow
Copy link
Member

lkysow commented Jan 8, 2019

Hmm, I've found something concerning. If you generate a plan for multiple projects and one of the plans fails, then if you apply the other project, Atlantis will automerge because there's no outstanding plans. The only reason it thinks that though is because there's no plan files.

Instead, we really should keep track of which projects we expect plans for in a PR. I've run into something similar in another bug where just using the planfiles on disk isn't enough. Instead I think we might need to keep track of this stuff in the on-disk database. I'm not asking you to do it, I will take it on, but what do you think?

edit: see https://github.com/lkysow/atlantis-example/pull/5

@brndnmtthws
Copy link
Author

brndnmtthws commented Jan 8, 2019

Can you tell me which Git hosts you've tested this with?

Only tested with GitHub.

Hmm, I've found something concerning. If you generate a plan for multiple projects and one of the plans fails, then if you apply the other project, Atlantis will automerge because there's no outstanding plans. The only reason it thinks that though is because there's no plan files.

Instead, we really should keep track of which projects we expect plans for in a PR. I've run into something similar in another bug where just using the planfiles on disk isn't enough. Instead I think we might need to keep track of this stuff in the on-disk database. I'm not asking you to do it, I will take it on, but what do you think?

Ah, I never considered that case. I sort of assumed that if the plan fails, it would not allow you to apply. Perhaps that's the real issue: maybe it shouldn't allow you to run apply until all the plans have succeeded? You could set a trap where it removes any pending plans if any of the plans in the repo fail for any reason.

Edit: to elaborate a little more on that, the scary thing for me would be to end up in an inconsistent state where part of the changes are applied, and the other part are not. Seems wise to me to err on the side of caution and back out.

@lkysow
Copy link
Member

lkysow commented Jan 9, 2019

I'm not sure I like deleting successful plans if any other projects fail because I like that Atlantis gives the users a choice. Automatically deleting the successful plans feels like Atlantis trying to be "too smart". There's all sorts of weird edge cases people might get in, and I like that Atlantis gives you a way to get out of them (for example via a manual atlantis plan -d). I know that this is also giving people the ability to shoot themselves in the foot but personally I like that tradeoff.

@gordonbondon @Vlaaaaaaaad @pratikmallya @majormoses @sstarcher how do you feel about if there are >1 plans in a pull request, and one of them fails, that the other plans are deleted and you can't apply any of them until you have all of the plans being generated successfully?

@brndnmtthws
Copy link
Author

I'm not sure I like deleting successful plans if any other projects fail because I like that Atlantis gives the users a choice. Automatically deleting the successful plans feels like Atlantis trying to be "too smart". There's all sorts of weird edge cases people might get in, and I like that Atlantis gives you a way to get out of them (for example via a manual atlantis plan -d). I know that this is also giving people the ability to shoot themselves in the foot but personally I like that tradeoff.

Yep this makes sense. For my particular use case, it's preferred to avoid any sort of traps, so it's better to fail all plans if any one plan fails.

Would be good to see what other people think.

Another option is to provide a config parameter for this, but it's probably better to take the convention over configuration approach. More knobs, more problems.

@ajbouh
Copy link

ajbouh commented Jan 9, 2019

Really excited about this feature!

The only argument I can think of against deleting successful plans if others fail is to fight flakiness.

If you have an enormous number of projects and there's a decent chance one of them will fail to generate a plan, you'd rather "lock in" the ones that work and just try to regenerate the ones that didn't.

In spite of that I would actually assume that the behavior is all or nothing (just like plan generation for a individual project).

@sstarcher
Copy link

I think the main benefit of requiring all of the plans to pass would be in ensuring all plans are applied before a merge happens. I would not want a user applying a single plan and ignoring the others and merging in the PR. If as the user is fixing the other plan they decide to not merge the PR now they now have changes that are applied for one set of terraform states, but will never be on master and must be reapplied by a user to revert it out.

@Vlaaaaaaad
Copy link

Yup, I agree with failing everything if a single plan fails. I don't want to end up in weird states with weird plans. And if a previous plan was successful it should be successful the second time it's ran too so deleting all of them works for me.

I think Atlantis should fail fast and give me the chance to fix the issue.

@kipkoan
Copy link
Contributor

kipkoan commented Jan 10, 2019 via email

@brndnmtthws
Copy link
Author

This is a breaking change so it needs to be a configurable option. Currently Atlantis allows us to apply a single project/workspace. We need that since we have multiple environments in a repo, and need to be able to plan/apply changes in test without also applying in prod.

I agree, this would be a breaking change if you depend on the current behaviour. So the question is, what should the behaviour be? Is this the behaviour we want? My preference is to emulate terraform itself, and fail when anything goes wrong.

Could you please elaborate a little bit more on the way you're using Atlantis?

The idea of "disallow apply when any plan fails" is to prevent any PR with interdependent changes across projects/workspaces from resulting in inconsistent state. It's the same way terraform itself handles terraform plan. If any individual change fails, the entire process fails (unless you explicitly state which target to apply with terraform plan -target=....

There isn't an easy way to handle separate workspaces in Atlantis yet. Maybe we should add an abstraction which lets you configure the failure handling on a per-workspace basis? The mapping of project to workspace is currently 1:1, but maybe we need to be able to specify workspaces and projects separately.

Here's how it currently looks:

version: 2
projects:
- dir: project1
  workspace: staging
- dir: project1
  workspace: production
- dir: project2
  workspace: staging
- dir: project2
  workspace: production
- dir: project3
  workspace: staging
- dir: project3
  workspace: production

If it is going to be configurable, I think it would make a lot more sense to do it on a per-project/workspace level. Having another global config flag isn't desirable IMO. Something like this would be a lot better:

version: 3
workspaces:
- name: staging
  allow_failures: yes
- name: production
  allow_failures: no
projects:
- dir: project1
- dir: project2
- dir: project3
  allow_failures: yes

Thoughts?

@sstarcher
Copy link

@kipkoan I don't know if we are talking about the same thing. We are not talking about requiring you to apply all changes. We are only talking about all changes must plan successfully. If they do then you will still be able to apply individual changes.

@kipkoan
Copy link
Contributor

kipkoan commented Jan 10, 2019

Our repo directory structure is exactly like this: https://www.terraform.io/docs/enterprise/workspaces/repo-structure.html#directories

When we do a PR, Atlantis runs a plan in each environment directory.

Even if one of the environment plans fails, we need to be able to apply any plan that succeeded.

@Vlaaaaaaad
Copy link

@kipkoan no separate workspaces for prod vs stage vs dev? Do you modify all of them in a single pull request and do specific applies?

@kipkoan
Copy link
Contributor

kipkoan commented Jan 10, 2019

The idea of "disallow apply when any plan fails" is to prevent any PR with interdependent changes across projects/workspaces from resulting in inconsistent state.

Our environment directories (and thus terraform plans) are not interdependent.

If you want multiple plans to be interdependent, and thus need to be applied as a group or not at all, then you can combine them into the same plan.

Atlantis currently behaves the way Terraform behaves -- plans are independent. You can apply one plan even if a different plan is failing. This should be the default.

I'm fine with having Atlantis be able to group plans to make them interdependent (and only apply if all plans succeed), but that should not be the default.

@kipkoan
Copy link
Contributor

kipkoan commented Jan 10, 2019

no separate workspaces for prod vs stage vs dev?

We do not use workspaces(*) for separate environments (and have verified that this is HashiCorp's recommendation).

Do you modify all of them in a single pull request and do specific applies?

A single PR could modify a single environment (and thus only one environment would have changes to apply), or a single PR could modify a module that is used across multiple environments. When multiple environments are changed, we almost always only apply only in non-prod environment (dev/test/stage) and will open another PR later to apply in prod once testing/QA has been done.

(*) OSS Terraform uses the term "workspaces" differently than TFE does.

@sstarcher
Copy link

In your specific use-case I would recommend not enabling this automerge feature. Merging in unapplied PRs seems to be a very strong anti-pattern that we should lean away from supporting.

A PR should be considered a group. Yes terraform treats things separately, but Atlantis should not.

@kipkoan
Copy link
Contributor

kipkoan commented Jan 10, 2019

In your specific use-case I would recommend not enabling this automerge feature.

Agreed, we don't currently have a use case for automerging. I just wanted to make sure that existing functionality is maintained.

Merging in unapplied PRs seems to be a very strong anti-pattern that we should lean away from supporting.

How do you promote changes through non-prod -> prod environments?

A PR should be considered a group. Yes terraform treats things separately, but Atlantis should not.

I think until there is a way for Atlantis to do environment promotions, Atlantis needs to treat plans separately.

@kipkoan
Copy link
Contributor

kipkoan commented Jan 10, 2019

@sstarcher and @Vlaaaaaaad -

  1. What is your repo directory structure?
  2. How do you promote tf changes through non-prod environments to prod?
  3. What is the use case for having multiple plans in a single PR, while at the same time not wanting to combine those into a single plan?

@sstarcher
Copy link

Atlantis should have nothing to do with environment promotion.

For me I version my terraform modules. I create a PR for dev. When I'm ready for staging I create a PR for staging etc

@kipkoan
Copy link
Contributor

kipkoan commented Jan 10, 2019

For me I version my terraform modules. I create a PR for dev. When I'm ready for staging I create a PR for staging etc

How do you guarantee that the PR for prod has already been successfully run in staging?

@kipkoan
Copy link
Contributor

kipkoan commented Jan 10, 2019

For me I version my terraform modules. I create a PR for dev. When I'm ready for staging I create a PR for staging etc

How do you version your modules? I'm curious about your workflow to see if it could work for us too. Do you have a separate repo for your modules than your environments?

@sstarcher
Copy link

Each terraform module is in its own github repo. Each github repo has tests. In development we will use the module SHA to select what to use for fast iteration. After it's ready to go to stg we will tag that module. Afterwards we update staging with that version of the module to point to the tag. After we are ready from that we update prod to use that version.

We treat terraform like we treat any other piece of software. We test, version, tag and pin our versions.

@kipkoan
Copy link
Contributor

kipkoan commented Jan 11, 2019

[moved to Slack]

@lkysow
Copy link
Member

lkysow commented Jan 11, 2019

I think if automerge is on, then we require all plans to be successful. Otherwise we keep the original behaviour where some plans can fail but you can still apply successful plans.

@ajbouh
Copy link

ajbouh commented Jan 11, 2019

Is this an easy thing to do in terms of code?

@brndnmtthws
Copy link
Author

I have updated the PR with the following changes:

  • Atlantis will write errors to a file when either terraform init or terraform plan fails.
  • If any error files are found from any previous steps, automerge will not proceed.
  • Added a few more test cases.

I may have missed some things, but I'd like to collect feedback in the mean time. Everything appears to be working as intended.

@brndnmtthws
Copy link
Author

Instead, we really should keep track of which projects we expect plans for in a PR. I've run into something similar in another bug where just using the planfiles on disk isn't enough. Instead I think we might need to keep track of this stuff in the on-disk database. I'm not asking you to do it, I will take it on, but what do you think?

I looked at the on-disk DB stuff, and I wasn't sure of a good way to handle this, so I went a different route: I'm simply writing a file into the working dir with the TF output when something fails. I think this is nice because it's easy to inspect externally, and doesn't require anything other than the filesystem.

Also, the circleci website_link_check job is failing, but I think that's because I'm pushing from an external repo.

Similar to autoplan, automerge is a feature whereby Atlantis will merge
a PR/MR once all plans have successfully been applied.

This addresses issue runatlantis#186.
@brndnmtthws
Copy link
Author

I just rebased this on master again, fixed the conflicts, and removed an extra println() that I forgot to remove.

@pratikmallya
Copy link
Contributor

Can this be merged? This is a feature that is often asked for : )

@lkysow
Copy link
Member

lkysow commented Feb 4, 2019

I am currently working on this in a different branch. I don't like the solution of writing out files to the file system to mimic a database. Instead I will be using the boltDB database we already have.

@lkysow lkysow mentioned this pull request Feb 6, 2019
@lkysow lkysow closed this in #457 Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants