Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle state migrations as code during module upgrades ? #1101

Closed
1 of 4 tasks
barryib opened this issue Nov 14, 2020 · 11 comments
Closed
1 of 4 tasks

How to handle state migrations as code during module upgrades ? #1101

barryib opened this issue Nov 14, 2020 · 11 comments

Comments

@barryib
Copy link
Member

barryib commented Nov 14, 2020

I have issues

I'm submitting a...

  • bug report
  • feature request
  • support request - read the FAQ first!
  • kudos, thank you, warm fuzzy

Discussion

I'm opening this issue to start discussion about migrations and state manipulations during upgrades, for #858 and for the next migrations ?

Today there is no an easy way to upgrade. As this module is getting more complex, we took the decision to split the module into small submodules. We discussed it in #635 and #774.

We now support Fargate and Managed Node Groups as submodules. The next step will be :

Those moves will add a lot of breaking changes and will break all clusters and workloads, so your production environement.

To avoid that, we're actually looking for a way to ease that migration.

One of the solution I've in mind is to use https://github.com/minamijoyo/tfmigrate and create a submodule to generate a tfmigrate.hcl file to help users in their migrations.

The workflow should be:

  1. terraform apply before upgrade to ensure that your code is in sync with your states
  2. backup your states
  3. Run the terraform apply with var.eks_tfmigrate=true to generate the tfmigrate.hcl
  4. Run tfmigrate plan
  5. Run tfmigrate apply
  6. Upgrade your module version and run terraform plan to see if there are some change

cc @terraform-aws-modules/triage-supporters @grzegorzlisowski @sc250024 @js-timbirkett

Additional infos

@barryib barryib pinned this issue Nov 14, 2020
@barryib barryib changed the title How to handle module migrations as code How to handle module migrations as code ? Nov 14, 2020
@barryib barryib changed the title How to handle module migrations as code ? How to handle state migrations as code during module upgrades ? Nov 15, 2020
@daroga0002
Copy link
Contributor

daroga0002 commented Nov 16, 2020

In general I like your plan. Questions/comments are:

  • does new code should be 0.12 compatible? (mostly is about tf 0.13 for_each for modules wich will be very beneficial)
  • maybe current monolithic code should be placed in dedicated additional branch ?
  • I think that beside tfmigrate there should be documented changes how to make terraform state manipulation manually, as not all users will want to use it on production environments (or they simply dont trust it)

On another side as those refactors will be really breaking changes maybe it is worth establishing an additional dev branch for some time (let say 2-3 months) to cover those changes and refactor (replace counts, break tf0.12 support for tf0.13 new features) with breaking compatibility.

After some clarification and getting stability of dev branch we will write migration path (as proposed by you) for new code and then replace master branch via new code. This approach has some benefits as allows higher velocity in introducing changes that currently are hard in iteration process guaranteeing compatibility with previously generated clusters.

Offcourse second approach has sense only if there will be engagement from community to work on that in reasonable time (from my side I am able to provide my 10-20% time for this initiative)

@jjhidalgar
Copy link
Contributor

jjhidalgar commented Nov 16, 2020

In relation to:
"Move worker groups as submodules and drop Launch Configuration support #858"

I think you should add the new worker groups as submodule, in parallel to any form of older worker groups, in at least 1 version of the module.

This way, we can replicate the same worker groups using submodules, and slowly scale down to zero the old workers (i.e. worker_groups_launch_template=), before upgrading to a version where old worker groups are not supported.

Or at least that is what I had in mind. But if the state migration just works well, then it might be better, don't have experience with that, but looks complex in the scenario of worker groups

@grzegorzlisowski
Copy link

In relation to:
"Move worker groups as submodules and drop Launch Configuration support #858"

I think you should add the new worker groups as submodule, in parallel to any form of older worker groups, in at least 1 version of the module.

This way, we can replicate the same worker groups using submodules, and slowly scale down to zero the old workers (i.e. worker_groups_launch_template=), before upgrading to a version where old worker groups are not supported.

Or at least that is what I had in mind. But if the state migration just works well, then it might be better, don't have experience with that, but looks complex in the scenario of worker groups

That was my original idea but after some suggestions that it might be not needed I have dropped this way:

#858 (comment)

@barryib
Copy link
Member Author

barryib commented Nov 17, 2020

@daroga0002

does new code should be 0.12 compatible? (mostly is about tf 0.13 for_each for modules wich will be very beneficial)
maybe current monolithic code should be placed in dedicated additional branch ?

I don't know precisely, because even if the for_each for module is great, I don't see for now what will be the real benefit of using it over the for_each in resources (just replace actual count with for_each). I'm still open to discussion. But with that said, we decided recently in a terraform-aws-modules office hours, that we can now use TF 0.13 features if we need them. We can use them if needed.

I think that beside tfmigrate there should be documented changes how to make terraform state manipulation manually, as not all users will want to use it on production environments (or they simply dont trust it)

Good point. In every cases, we should provide a good docs for that. Maybe, we can just generate terraform state mv commands and not use tfmigrate at all ?

On another side as those refactors will be really breaking changes maybe it is worth establishing an additional dev branch for some time (let say 2-3 months) to cover those changes and refactor (replace counts, break tf0.12 support for tf0.13 new features) with breaking compatibility.

The main point is that we don't want to support multiple branches. It'll introduce lot of works for maintainers (features backporting, conflict resolving, code cleaning, etc.). So we can create a temporary branch (by example dev-submodules) to let users tests those features easily and let them know that code in that branch is subject to change with breaking changes.

Offcourse second approach has sense only if there will be engagement from community to work on that in reasonable time (from my side I am able to provide my 10-20% time for this initiative)

Thanks, we'll keep that in mind.

@jaimehrubiks @grzegorzlisowski

In relation to:
"Move worker groups as submodules and drop Launch Configuration support #858"
I think you should add the new worker groups as submodule, in parallel to any form of older worker groups, in at least 1 version of the module.

That was my original idea but after some suggestions that it might be not needed I have dropped this way:

As I mentioned in #858 (comment) we'll still need to move other resources (cluster, configmap, iam policies, etc.), so I think we can move them all at once and document how to do it correctly.

Generally speaking, state migration is a Terraform Core issue. To me, it should provide something to help migrations. But until we get something in the Hashicorp lands (maybe with this hashicorp/terraform#19354), Terraform users and module developers will handle state manipulations themselves. For now, if we're not ready to do this, I think if we should stop using Terraform (at least until there is an elegant way to tackle this kind of issue).

@stale
Copy link

stale bot commented Feb 15, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 15, 2021
@stale
Copy link

stale bot commented Mar 18, 2021

This issue has been automatically closed because it has not had recent activity since being marked as stale.

@stale
Copy link

stale bot commented Aug 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 2, 2021
@daroga0002
Copy link
Contributor

this seems be future solution hashicorp/terraform#29126 to observe

@stale stale bot removed the stale label Aug 26, 2021
@stale
Copy link

stale bot commented Sep 25, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 25, 2021
@stale
Copy link

stale bot commented Oct 3, 2021

This issue has been automatically closed because it has not had recent activity since being marked as stale.

@stale stale bot closed this as completed Oct 3, 2021
@antonbabenko antonbabenko unpinned this issue Apr 7, 2022
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants