Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to bump cluster version while on the channel #675

Open
redbaron opened this issue Sep 16, 2020 · 17 comments
Open

Unable to bump cluster version while on the channel #675

redbaron opened this issue Sep 16, 2020 · 17 comments
Labels
enhancement New feature or request P4 low priority issues triaged Scoped and ready for work

Comments

@redbaron
Copy link
Contributor

redbaron commented Sep 16, 2020

I am using beta-private-cluster module version 11.1.0 and configured clusters with release_channel = "REGULAR". New security bulletin ( https://cloud.google.com/kubernetes-engine/docs/security-bulletins#gcp-2020-012 ) recommends updating cluster version.

I tried to set kubernetes_version variable, but it seems to be ignored by the module when release channel is set, so I am forced to use console/gcloud upgrade to upgrade cluster version and can't use the TF module for this.

Shouldn't it pass kubernetes_version to min_master_version even if release channel is set?

@bharathkkb
Copy link
Member

@redbaron Based on docs it does not seem to explicitly say we can use min_master_version and release_channel together. I will test it out. Also I believe REGULAR channel has the fix with R30.

@ivankorn
Copy link
Contributor

@redbaron there is a catch: kubernetes_version works for the master version. version parameter in the node_pools list of maps works for the worker nodes version and only when auto_upgrade is false.

@ivankorn
Copy link
Contributor

ivankorn commented Sep 17, 2020

We just can't use both channel and version variables by the API design.
If the channel is specified than Google maintains updates by itself for both master and nodes and auto_upgrade is forced to be true.

https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#Cluster.ReleaseChannel

But the question for the terraform module is still valid: we may want to restrict definitions of the release_channel and auto_upgrade variables after migration to TF 13 which got more flexible variables constraints. As auto_upgrade=false is only valid for UNSPECIFIED release_channel=null @aaron-lane

Before the migration we can try to handle it here or leave as is as API error is self descriptive enough.

Just confirmed that with following tests: UNSPECIFIED(default) channel upgrade to 1.15.12-gke.20 on tag v8.1.0 and UNSPECIFIED upgrade to 1.15.12-gke.20 on tag v11.1.0.

Works as expected:
kubernetes_version got master updated; version && auto_upgrade=false - got nodes updated.

Then on v11.1.0 I tried to play with the release_channel. REGULAR and auto_upgrade=false with both of the versions 1.16.13-gke.401 and the newer 1.17.9-gke.1504

module.gke.google_container_node_pool.pools["default-node-pool"]: Modifying... [id=projects/****/locations/us-central1/clusters/*****/nodePools/default-node-pool]

Error: googleapi: Error 400: Auto_upgrade cannot be false when release_channel REGULAR is set., badRequest

  on .terraform/modules/gke/modules/beta-private-cluster/cluster.tf line 278, in resource "google_container_node_pool" "pools":
 278: resource "google_container_node_pool" "pools" {

The same situation is reproducible on STABLE channel with lower v 15 default 1.15.12-gke.20

Error: googleapi: Error 400: Auto_upgrade cannot be false when release_channel STABLE is set., badRequest

  on .terraform/modules/gke/modules/beta-private-cluster/cluster.tf line 278, in resource "google_container_node_pool" "pools":
 278: resource "google_container_node_pool" "pools" {

So we either use the release_channel for fully automatic upgrades or we use master/node or both version for the "manual" way.

@ivankorn ivankorn added wontfix This will not be worked on and removed wontfix This will not be worked on labels Sep 17, 2020
@redbaron
Copy link
Contributor Author

So we either use the release_channel for fully automatic upgrades or we use master/node or both version for the "manual" way.

@ivankorn , I was able to update versions on both master and nodes via console, and I presume it is possible to so via gcloud as well, without disabling node autoupgrade or changing release channel. It contradicts your findings, does it mean that TF provider imposes extra limitations, which are not present in GCP API?

@ivankorn
Copy link
Contributor

@redbaron looks like GCP API doc explicitly says that the channel clusters are not supposed to be manually updated. So you kind off should never touch it as far as I understand. Google does the upgrade job under the hood. If the web interface allows you doing the upgrade on channel-enabled cluster - then it's indeed a little confusing and we may want to ask Google engineers to comment on that when they get a chance @aaron-lane @morgante

@redbaron have you observed the automatic updates on your channel-enabled clusters yet, btw? I mean it should work, what was your motivation for force-upgrade?

At TF side we can only restrict the variables definitions as I proposed in #677 so users get clear TF error on attempt to use auto_upgrade=false and version on channel-enabled clusters.

@bharathkkb
Copy link
Member

@redbaron I tried to repro via cloud console but the drop down is only giving me one option. For you did it allow multiple versions? @ivankorn's findings seem accurate to me and that is my understanding as well, if GKE manages release channel we cannot explicitly set version.
image

@redbaron
Copy link
Contributor Author

@bharathkkb ,

It all depends on timing. At the time of R29.1 release cluster was on default version, but another version was available as an upgrade.

I can see that @ivankorn , didn't test explicitly the case where channel is set, auto upgrade is true and min_master_version is given value as well. I suspect it might be the key for updating to non default versions

@morgante
Copy link
Contributor

I'm of the opinion that we explicitly should discourage manually setting a version when using release channels. The intent of release channels is to allow Google to manage the cluster version on your behalf (with a particular cadence) and accepting a version undermines that.

@redbaron
Copy link
Contributor Author

@morgante , in general I'd agree with you, but like was the case with R29.1, new version with security or bug fix important to you can be added to the channel without making it a mandatory upgrade, just an option to upgrade to. There should be a way to get it without going to the console, don't you think?

@redbaron
Copy link
Contributor Author

redbaron commented Oct 2, 2020

@ivankorn , @bharathkkb ,

this situation repeated again with another cluster. It was created using this TF module in Regular channel with at the time the latest version 1.16.13-gke.1 . Since then new versions were added to the channel, master is not upgraded by GKE itself, it just offers upgrades. How do I upgrade master using TF?

Console offers me the upgrade, but TF module ignores kubernetes_version all together, it doesn't seem to pass it as min_master_version to the TF google_container_cluster resource.

Screenshot from 2020-10-02 10-06-52

@morgante
Copy link
Contributor

morgante commented Oct 2, 2020

Just to be clear, is your desire to upgrade prior to when GKE does it on your behalf?

If you want this level of control, I'm wondering why use release channels at all? If you want to be in charge of upgrading your clusters yourself, I'm not sure release channels are really suited to you.

@redbaron
Copy link
Contributor Author

redbaron commented Oct 3, 2020

One of the purpose of the channel is to limit versions you can use to ones matching channel policy: Rapid, Regular, Stable. Additionally it does mandatory upgrades.

What I am asking for is to be able to perform optional upgrades to the new versions available in the channel. It should be possible by passing min_kubernetes_version parameter to the TF resource. This matches what one can do with the console today.

@tzoratto
Copy link

tzoratto commented Oct 7, 2020

Hi,

actually the official GKE documentation for channel clusters states the following :

GKE automatically upgrades clusters to the default version gradually. If more control over the upgrade process is necessary, we recommend upgrading ahead of time to an available version. The GKE auto-upgrade process does not modify cluster resources when those resources have a version that is equal to or greater than the auto-upgrade target.

https://cloud.google.com/kubernetes-engine/docs/concepts/release-channels?hl=en#what_versions_are_available_in_a_channel

So they recommend to manually upgrade channel clusters if needed for whatever reason (and I think that applying a security fix ASAP is a valid one).

But of course if the google API doesn't allow the use of channel and min version together we're screwed !

@github-actions
Copy link

github-actions bot commented Jan 5, 2021

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the Stale label Jan 5, 2021
@yashbhutwala
Copy link
Contributor

Still relevant and needs to be addressed.

@kriswuollett
Copy link

Not having this feature costs me an hour of extra wait time when I'm testing cluster creation. If this is still relevant to anyone, please comment/vote on the issue I created at https://issuetracker.google.com/issues/235121270, as it may affect more than just the Terraform module as pointed out by @tzoratto.

@snigdhasambitak
Copy link

snigdhasambitak commented May 16, 2023

We recently bumped into this issue while using the private-cluster submodule. We wanted to disable the auto-upgrade option for the node pools and our release_channel was stable before.

If you want to disable auto-upgrade for your node pools then it basically needs 3 steps for existing cluster:

  1. Modify the release_channel to be either null or "UNSPECIFIED". We went with "UNSPECIFIED". Using STABLE or REGULAR won’t work if you want to disable auto-upgrade of node pools.
release_channel                     = "UNSPECIFIED"      
  1. In the module ensure that you are using kubernetes_version. Because we didn’t want to bump the control plane k8s version so we wanted to just keep the existing version. If you leave it blank then it will default to latest which might be 2 minor versions beyond what you are currently using and will prevent an update. There we had set the :
kubernetes_version               = ""

This will ensure that it doesn’t use the latest k8s version and retains your existing version

  1. And lastly you can disable auto upgarde for your node pool using auto-upgrade = false as follows :
node_pools = [
    {
      name               = "default-e2-standard-4"
      machine_type       = "e2-standard-4"
      node_locations     = "europe-west1-b,europe-west1-c,europe-west1-d"
      min_count          = 0
      max_count          = 30
      disk_size_gb       = 100
      disk_type          = "pd-standard"
      image_type         = "COS_CONTAINERD"
      auto_upgrade       = false
      initial_node_count = 1
      enable_secure_boot = true
    },

This works on the latest version of the terraform module. Hopefully this helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P4 low priority issues triaged Scoped and ready for work
Projects
None yet
Development

No branches or pull requests

8 participants