Non-backwards compatible change: CP using NLB #90

bellis-ai · 2023-08-08T14:25:10Z

In a recent update, the control plane was changed to use a NLB instead of a classic load balancer. Those upgrading the module to the latest version will find the following error take place

CP Loadbalancer is upgraded to NLB
Target groups are created for the NLB
Autoscaling group has "ignore_changes" on "load_balancer" and "target_groups" property, and ignores the changes for the latest load balancer.
Because the autoscaling group does not have the correct load balancer settings, instances are not automatically assigned to the NLB and the control plane fails.

Not sure how to fix.

bellis-ai · 2023-08-11T13:39:45Z

Looks like it's a matter of just changing the Autoscaling group to use the new NLB, importing it back into state, and then adding the security groups (ends in -cp) to each control plane instance

adamacosta · 2023-08-11T15:24:29Z

I haven't yet worked out how to handle a migration gracefully to the NLB, but beware that if you just update in-place, the new load balancer will have a different DNS name from the old one and that will invalidate the server certificate being served by the Kube api-server, which will have the DNS name of the old load balancer in its SAN list, placed in there automatically by our module.

I believe, but have not yet tested, that what you have to do is:

Create the new NLB outside of Terraform first
Grab the DNS name from AWS and add it to the TLS san list in the /etc/rke2/config.yaml files on all of the control plane nodes
Cycle rke2 on the control plane nodes to generate a new certificate that will include this
Import the load balancer into Terraform state
Then do the rest of what you're saying above

Alternatively, if you have a custom URL and DNS record for the api-server and already included that in the TLS san list, none of this will matter.

bellis-ai · 2023-08-11T15:26:37Z

Thank you so much! I was just encountering this problem when trying to cycle out the old master nodes -- none were joining the cluster! I'll try this now.

bellis-ai · 2023-08-11T15:32:55Z

@adamacosta When you say
Cycle rke2 on the control plane nodes to generate a new certificate that will include this
What do you mean exactly? Restart the systemctl service? How do I cycle rke2? I am not very experienced in manual deployment of RKE2, so I'd like to know what I need to restart

adamacosta · 2023-08-11T16:12:23Z

Yes, run systemctl restart rke2-server on each control plane node, after editing the config.yaml file. That should generate a new certificate with the added TLS san for the new load balancer in it. Then new nodes should be able to join after that.

bellis-ai · 2023-08-11T17:30:56Z

I feel like something's missing. Any ping to 9345 after the config change and rke2-server restart results in a TLS error for "SSL23_GET_SERVER_HELLO". (pings to 6443 still go through). I feel like there's a cert I'm missing here...

bellis-ai · 2023-08-11T22:47:38Z

So it looks like the changes are indeed propagated to serving-kube-apiserver.crt, but whatever cert is being used for the supervisor does not change out. I have no idea how to force change it.

bellis-ai · 2023-08-29T18:43:08Z

Figured it out. You have to invalidate the cached certificate data by deleting /var/lib/rancher/rke2/server/tls/dynamic-cert.json. No idea why this isn't done automatically when the certificate data is different.

adamacosta · 2023-11-06T20:47:25Z

Hey, thanks for figuring that out. Apologies for not following this better. I did get around to trying this out and it worked fine for me in terms of hitting the api server, but I only ran it on a single host, so the supervisor process would have been unused anyway. I'm not going to close this right away because we should put this in a real migration doc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-backwards compatible change: CP using NLB #90

Non-backwards compatible change: CP using NLB #90

bellis-ai commented Aug 8, 2023

bellis-ai commented Aug 11, 2023

adamacosta commented Aug 11, 2023 •

edited

Loading

bellis-ai commented Aug 11, 2023

bellis-ai commented Aug 11, 2023

adamacosta commented Aug 11, 2023 •

edited

Loading

bellis-ai commented Aug 11, 2023

bellis-ai commented Aug 11, 2023

bellis-ai commented Aug 29, 2023

adamacosta commented Nov 6, 2023

Non-backwards compatible change: CP using NLB #90

Non-backwards compatible change: CP using NLB #90

Comments

bellis-ai commented Aug 8, 2023

bellis-ai commented Aug 11, 2023

adamacosta commented Aug 11, 2023 • edited Loading

bellis-ai commented Aug 11, 2023

bellis-ai commented Aug 11, 2023

adamacosta commented Aug 11, 2023 • edited Loading

bellis-ai commented Aug 11, 2023

bellis-ai commented Aug 11, 2023

bellis-ai commented Aug 29, 2023

adamacosta commented Nov 6, 2023

adamacosta commented Aug 11, 2023 •

edited

Loading

adamacosta commented Aug 11, 2023 •

edited

Loading