-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token CA hash does not match the Cluster CA certificate hash for Quick Start Guide.. #3214
Comments
What is the |
config yaml:
This is pointing directly to the main server node. |
And also to note, over the entirety of the week and the multiple attempts, either node no. 2 will fail to join or no. 3 will fail to join with errors such as this one. Node no. 3 joined perfectly fine in this case this time. |
Not sure if this will help but here is the output of curl -kv https://192.168.100.39:9345 (server node) From worker 1(failing to join):
From worker node 2(successful join):
|
Do you have the exact same token on both agents? Did you copy it directly from the server node? Does the token match the contents of What do you get from |
Are these VMs deployed to different environments or something? Are you sure 192.168.100.39 reaches the same host from both nodes? Do you have a HTTP proxy or something else that might be interfering? |
These are VM's, yes. What way would you like me to check to verify that they're hitting the same host? Willing to try whatever you need to me to try in order to troubleshoot. We have a proxy that faces external to handle basic web traffic but this is just a local <-> to local connection so the proxy is avoided at this point. |
That's a bit beyond what I can help you troubleshoot; it seems pretty clear that something is wrong with your environment - but you might try just the basics:
|
I destroyed the 1st worker node and going to try and rebuild the VM real quick. I'll report back in 5-10 minutes. |
So, surpsingly enough, after the double digits worth of times I deleted proxmox VM's and restarted the process over again, this time it worked. However, the proxy wont stay open :( why meeeee lol. Not sure if the join was actually correct if its showing this now |
Should I open a separate issue for this next issue im running into? |
no, it looks like another aspect of the same thing. Is something going on with the network between the VMs? |
@jordan-lumley did you figure this out? I'm seeing much the same, but the only difference I can tell, is that whenever I use node-ip in the server config, pods like coredns and nginx aren't able to ever complete, leaving out node-ip, and my cluster starts but using the wrong adapter/ip for communication. |
This one helped me:
Hope it helps someone :) |
I think there is a clarification problem in the documentation. If you add token to your initial servers /etc/rancher/rke2/config.yaml, that token is not picked up by RKE. In fact, another token will be generated and added to /var/lib/rancher/rke2/server/node-token. 172.20.10.100 loadbalancer pointing towards 172.20.10.15 Example from my initial master:
And this is from my second master:
When using curl, to compare cacerts like above you need to remember that even an error output will be piped to sha256sum. Example below look like a faulty token, but its in fact an empty reply from a non existing service:
I'd like to know how to force the token on my initial master, because I'm doing Ansible templating with config.yml, and deploying multiple clusters. I've been doing like jordan-lumley reinstalling vms, but more like tripple-digit rollbacks on VM snapshots in order to figure out whats wrong with the installation documentation. Even after figuring out the issue with the token, I still don't get the same ca-cert on my 3 masters:
And the last one is different:
Looking at the tokens again on the failing master:
Why are /var/lib/rancher/rke2/server/node-token populated with another token? I am 100% certain I initiated config.yml with the proper token before starting anything on the master. This is how I initialized the last master:
Step 5, this is where I cut and paste content from a working server. After the failed step, journalctl tells me:
I'm not using network manager, so that complaint about nm-cloud-setup is caused by that. But the complaint about cacerts is probably a reason to why my certificate doesn't match. After a while, the installation starts anyway and I see how the third master initialize and joins the cluster of masters. |
This error still exist in: rke2_version: v1.25.3+rke2r1 on RPM based OS. @brandond could you give me some insight in why the token from the configuration file are not picked up by the initial server? |
You can't change the passphrase portion of the token after the fact. I can't really follow the sequence that you described above, but if you have a short (passphrase-only) token set in the config file at the initial startup of the cluster, that token will be honored. If you want to later update the short token to a full Note that it only makes sense to set the short token on the first server, not a full |
So after changing into a passphrase instead of token, for the token parameter, things work. Thanks for the clarification about the difference between a K10 format token, and passphrase for the token parameter! |
Closing as this does not appear to be a bug with rke2 |
Environmental Info:
RKE2 Version: rke2 version v1.23.9+rke2r1 (2d206eb)
go version go1.17.5b7
Node(s) CPU architecture, OS, and Version: 3 vm nodes under a proxmox controller.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal
Linux pos01.prd 5.13.0-40-generic #45~20.04.1-Ubuntu SMP Mon Apr 4 09:38:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration: 1 server and 2 worker nodes
Describe the bug:
Following the quickstart guide step by step and keep running into CA cert errors. I have spent 4 days destroying the VM's and starting back from scratch time and time again. Sometimes worker node 1 will join but 2 will fail with these errors. At the time of writing this ticket, worker node 1 is getting the error but node 2 joined successfully.
Steps To Reproduce:
Just following the quick start guide.
Expected behavior:
The nodes to join successfully.
Actual behavior:
Additional context / logs:
The logs above are from the output from the beginning of journactl from the 1st worker node ONLY. The full error message is:
Aug 04 18:47:11 pos02.prd rke2[48435]: time="2022-08-04T18:47:11Z" level=error msg="token CA hash does not match the Cluster CA certificate hash: 537e7956a582a6ac14a6e0c5eb7961030efd8c9590550b5b580230e213237868 != 5dc21fa3143eea7ca48f468cdfb5c36553b8b0e184c7fbb9db9d1777d66bfed1"
The text was updated successfully, but these errors were encountered: