Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Terraform Provider RKE to 1.3.11 #342

Closed
a-blender opened this issue May 19, 2022 · 24 comments
Closed

Update Terraform Provider RKE to 1.3.11 #342

a-blender opened this issue May 19, 2022 · 24 comments
Assignees
Labels
enhancement New feature or request team/area2

Comments

@a-blender
Copy link
Contributor

Bump RKE version to 1.3.11

This version uses etcd v3.5.3 that includes the fix for etcd-io/etcd#13766. Due to this etcd bug, versions 1.3.9 was not recommended for production use.

I'm far form being a golang guru so I based my changes mainly on @VIKMSTR and @fe-ax already opened PRs. Since 1.3.10 ships with the etcd fix, I think we should concentrate on this PRs if that's OK for you @VIKMSTR and @fe-ax. I successfully built the provider and used it to build a test cluster with 3 control planes nodes and 2 workers. Still, I may have missed some key changes that should be made, so double checking would be wise ^^.

Changes include

  • Update Go version to 1.17.9 (1.17 is the version used by RKE)
  • Updating dependencies in go.mod. I updated all deps to their latest available version except for github.com/hashicorp/terraform-plugin-sdk. I didn't upgrade it to v2 because it seems like many changes in provider code should be made to make it compatible so this should be part of a dedicated PR IMO. Therefore, I simply updated this dep to the last v1.x available
  • Force github.com/spf13/afero to stick to v1.2.2 required by the version of github.com/hashicorp/terraform-plugin-sdk mentioned above
@tsde
Copy link

tsde commented May 19, 2022

@annablender Following your comment, I ran an acceptance test with make testacc. The TestAccResourceRKECluster test fails on my branch but also on the master branch with the following error:

Failed running cluster err:Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system
--- FAIL: TestAccResourceRKECluster (214.07s)

Complete log file: test.log

@a-blender
Copy link
Contributor Author

a-blender commented May 20, 2022

I also tested make testacc in my local terraform RKE branch and ran into errors. Not sure why the network plugin failed to run on worker nodes but if it works on your test cluster that's fine. This just needs to be tested to make sure an RKE cluster with all roles (etcd nodes as well) runs correctly and enabling the dockerd CRI for kubernetes 1.24 works as expected. Please verify that works after #337 (review) has been merged. Thanks!

@tsde
Copy link

tsde commented May 23, 2022

@annablender I locally applied the PR #337 and tested the provisionning of a one node cluster v1.23.6 (with etcd/control plane/worker roles). I managed to have a cluster up and running with either enable_cri_dockerd set to false or true.
Of course, I can't test 1.24 as it is not supported by RKE yet, but at least PR #337 seems to do the job.

As a side note, enabling the option led me to the behavior described in rancher/rke#2938 so I added a comment with my findings. This is not related to this terraform provider in any way, but it may be a blocker in a near future for supporting 1.24.

@a-blender
Copy link
Contributor Author

Great, thanks for testing this!

@a-blender
Copy link
Contributor Author

a-blender commented Jun 6, 2022

Testing template

Root cause

Update Terraform RKE provider.

What was fixed, or what changes have occurred

Two major changes have been done by community PRs (thank you open source!)

Areas or cases that should be tested

  • Run make test in the provider repo and verify structure tests pass
  • Test provisioning a one node RKE cluster with kubernetes version v1.23.6 (with etcd/control plane/worker roles). Test that cluster is healthy when enable_cri_dockerd is both false and true
  • Test provisioning a three node RKE cluster (with etcd/control plane/worker nodes)
  • Test provisioning with kubernetes version 1.24 (when supported by RKE)

What areas could experience regressions ?

Provisioning of standalone RKE clusters where enable_cri_dockerd: true.

Are the repro steps accurate/minimal ?

Yes.

@tmsdce
Copy link

tmsdce commented Jun 24, 2022

Thanks for merging the related PRs
@annablender @Josh-Diamond any ETA for cutting a new release integrating these changes ?

@Josh-Diamond
Copy link

@tmsdce Undergoing testing now 👍

@Josh-Diamond
Copy link

Ticket #342 - Test Results [pt. 1]

Pending test:

  • Test provisioning with kubernetes version 1.24 - [currently unsupported]

With Docker on a single-node instance:

Verified on rancher v2.6.6:

  1. Run make test in the provider repo and verify structure tests pass
  2. Fresh install of rancher v.2.6.6
  3. With Terraform, provision a single-node, all-roles RKE cluster w/ k8s v1.23.6-rancher1-1, and setting RKE config enable_cri_dockerd to true
  4. With Terraform, provision a single-node, all-roles RKE cluster w/ k8s v1.23.6-rancher1-1, and setting RKE config enable_cri_dockerd to false
  5. With Terraform, provision a 3 node [1 etcd, 1 cp, 1 wkr] RKE cluster w/ k8s v1.23.6-rancher1-1, and setting RKE config enable_cri_dockerd to true
  6. Verified - All tests successful

Screenshots:

Step 1
Screen Shot 2022-07-05 at 9 56 23 AM

Step 3
Screen Shot 2022-07-05 at 10 28 48 AM

Step 4
Screen Shot 2022-07-05 at 10 40 15 AM

Step 5
Screen Shot 2022-07-05 at 11 09 05 AM

Additional Information:

  • moving to blocked until k8s v1.24 is supported by RKE;
  • will resume testing then

@tsde
Copy link

tsde commented Jul 6, 2022

Hi @Josh-Diamond,
Thanks for all the testing.

Does the blocked status applied means that no release is about to be cut until 1.24 is supported ? Is there any chance to cut a release that at least includes acb5b7c (uses RKE v1.3.11) ?

It would allow users to deploy Kubernetes 1.22 and 1.23 safely with etcd v3.5.3 that is not subjected to this issue. Waiting for 1.24 can take a significant amount of time and I'd like to be able to upgrade my clusters to 1.23. This was the primary objective of this issue. Maybe I misunderstand the blocked status, so correct me if I'm wrong ^^

@zube zube bot added the [zube]: Blocked label Jul 20, 2022
@Sartigan
Copy link

Sartigan commented Jul 21, 2022

When can we expect an update for this? The provider has been locked to RKE Version 1.3.3 for 7-8 months now... we are currently locked to kubernetes v1.22.4-rancher1-1 and we can not use important etcd fixes because of the error above.

@a-blender
Copy link
Contributor Author

a-blender commented Jul 22, 2022

We are releasing terraform provider RKE v1.3.2 today which uses RKE v1.3.11 (has kubernetes default version 1.23.6 and important etcd fixes that were mentioned). It should be published by Hashicorp by Monday. Once that is done, this issue can be retested @Josh-Diamond.

@Patricol
Copy link

I was too excited to wait after seeing the release; and used dev_overrides to install it. I wish I had done so much sooner, because this update fixed multiple issues that I've spent >40 hours debugging and dozens more hours working around over the last half a year.

Thank you so much for this release!

@a-blender
Copy link
Contributor Author

@Patricol wonderful! It's my pleasure to help.

@hutm
Copy link

hutm commented Jul 24, 2022

@annablender , will this release be pushed to terraform provider registry? https://registry.terraform.io/providers/rancher/rke/latest points to 1.3.1. Thanks!

@Sartigan
Copy link

@annablender , will this release be pushed to terraform provider registry? https://registry.terraform.io/providers/rancher/rke/latest points to 1.3.1. Thanks!

Comment says it will be pushed monday

@hutm
Copy link

hutm commented Jul 26, 2022

@Sartigan , @annablender it seems that did not happen on Monday. Are there plans to update terraform registry release? Thanks!

@jiaqiluo
Copy link
Member

jiaqiluo commented Jul 28, 2022

FYI, 1.3.12 is available now. See https://registry.terraform.io/providers/rancher/rke/1.3.2

cc @Sartigan @hutm

@Josh-Diamond
Copy link

Josh-Diamond commented Jul 28, 2022

Ticket #342 - Test Results [cont. ]

Pending test:

  • Test provisioning with kubernetes version 1.24

Verified with rke v1.3.2:

  1. Provision a single-node RKE cluster w/ k8s v1.23.6 and enable_cri_dockerd = true
  2. Verified - Successfully provisions; healthy and ready cluster
  3. Provision a single-node RKE cluster w/ k8s v1.23.6 and enable_cri_dockerd = false
  4. Verified- Successfully provisions; healthy and ready cluster
  5. Test provisioning a three-node RKE cluster [individual roles] w/ k8s v1.23.6
  6. Verified - Successfully provisions; healthy and ready cluster

Screenshots:

Step 2 - Status
enabled_ready

Step 2 - Performance
enabled_top

Step 4 - Status
disabled_ready

Step 4 - Performance
disabled_top

Step 6 - Status
3_node_ready


Note:
Provisioning w/ k8s 1.24 will be tracked here. because terraform-provider-rke 1.3.2 embeds rke v1.3.11 which does not support 1.24. Once RKE 1.3.13 is released, a new version of terraform-provider-rke will be released to support this test case.

@jiaqiluo
Copy link
Member

jiaqiluo commented Jul 28, 2022

It looks like the failure of creating 1.24 cluster is expected, because terraform-provider-rke 1.3.2 embeds rke 1.3.11 which does not support 1.24. ( upcoming rke 1.3.13 will support 1.24)

per the offline conversation with @Josh-Diamond , a new issue will be opened for tracking the support for 1.24. and this issue can be closed.

@Josh-Diamond
Copy link

Provisioning w/ k8s 1.24 will be tracked #354

@zube zube bot closed this as completed Jul 28, 2022
@zube zube bot removed the [zube]: Done label Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request team/area2
Projects
None yet
Development

No branches or pull requests

8 participants