Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-1.25] Simultaneously started K3s servers may race to create CA certificates when using external SQL #7223

Closed
brandond opened this issue Apr 5, 2023 · 1 comment
Assignees
Milestone

Comments

@brandond
Copy link
Member

brandond commented Apr 5, 2023

@brandond brandond self-assigned this Apr 5, 2023
@brandond brandond added this to the v1.25.9+k3s1 milestone Apr 5, 2023
@brandond brandond moved this to Peer Review in K3s Development Apr 5, 2023
@brandond brandond moved this from Peer Review to To Test in K3s Development Apr 5, 2023
@VestigeJ
Copy link

VestigeJ commented Apr 14, 2023

I tried really hard to reproduce this with five server nodes - I know we managed to trigger it with nine server nodes when Shy and I were working together on it.

I could not reproduce it functionally nor getting the same error via the logs with a postgres database.

##Environment Details
Attempted reproductions using VERSION=v1.25.8+k3s1 VERSION=v1.24.12+k3s1
Attempted best effort validation using RC VERSION=v1.25.9-rc1+k3s1

Infrastructure

  • Cloud

Node(s) CPU architecture, OS, and version:

Linux 5.14.21-150400.24.11-default x86_64 GNU/Linux 
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"

Cluster Configuration:

NAME               STATUS   ROLES                  AGE     VERSION
ip-SERVER0         Ready    control-plane,master   3m37s   v1.25.9-rc1+k3s1
ip-SERVER1         Ready    control-plane,master   3m40s   v1.25.9-rc1+k3s1
ip-SERVER2         Ready    control-plane,master   3m40s   v1.25.9-rc1+k3s1
ip-SERVER3         Ready    control-plane,master   3m40s   v1.25.9-rc1+k3s1
ip-SERVER4         Ready    control-plane,master   3m35s   v1.25.9-rc1+k3s1 

Config.yaml:

write-kubeconfig-mode: 644
debug: true
token: calcifiedlies
selinux: true
protect-kernel-defaults: true
datastore-endpoint: postgres://k3s:k3s@yolo-monsters:5432/kubernetes

YOUR_REPRODUCED_RESULTS_HERE

$ curl https://get.k3s.io --output install-"k3s".sh
$ sudo chmod +x install-"k3s".sh
$ sudo groupadd --system etcd && sudo useradd -s /sbin/nologin --system -g etcd etcd
$ sudo modprobe ip_vs_rr
$ sudo modprobe ip_vs_wrr
$ sudo modprobe ip_vs_sh
$ sudo printf "on_oovm.panic_on_oom=0 \nvm.overcommit_memory=1 \nkernel.panic=10 \nkernel.panic_ps=1 \nkernel.panic_on_oops=1 \n" > ~/90-kubelet.conf
$ sudo cp 90-kubelet.conf /etc/sysctl.d/
$ sudo systemctl restart systemd-sysctl
.
.
.
. //failures galore to reproduce
$ sudo INSTALL_K3S_VERSION=v1.25.8+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_COMMIT=d9f40d4f5b4776164322035499fabedea77f5f52 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_COMMIT=59e573d111f8863916c37fdac92e2412485371b9 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_VERSION=v1.25.8+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_VERSION=v1.25.8+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_VERSION=v1.25.8+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_VERSION=v1.25.8+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_VERSION=v1.25.8+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_VERSION=v1.25.8+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_VERSION=v1.25.8+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_VERSION=v1.25.8+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_VERSION=v1.25.8+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ sudo INSTALL_K3S_VERSION=v1.25.8+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
.
.
$ sudo journalctl -u k3s | grep -i "ecdsa" | grep -i "unable" //run after each attempt to trigger the race condition to catch the error in logs. 
$ void; clean_logs; go_replay; clear; //killall, uninstall, vaccuum logs, restore config.yaml, clear screen, re-install attempt...
.
.
.
$ sudo INSTALL_K3S_VERSION=v1.25.9-rc1+k3s1 INSTALL_K3S_EXEC=server ./install-k3s.sh 
$ set_kubefig
$ kgn
$ sudo journalctl -u k3s | grep -i "ecdsa" | grep -i "unable" 
$ get_report //generate this template

Results:

Not able to reproduce with five servers simultaneously on a small postgres server running on Ubuntu 22.04 t2.micro

In case you end up wanting some visible steps to follow for wiping out the database from postgres

postgres=# \c kubernetes

You are now connected to database "kubernetes" as user "postgres". Note do this after performing a killall/uninstall on all server nodes or they'll keep writing to the DB. 
kubernetes=# \dt
       List of relations
 Schema | Name | Type  | Owner 
--------+------+-------+-------
 public | kine | table | k3s
(1 row)

kubernetes=# TRUNCATE kine;

TRUNCATE TABLE

kubernetes=# DELETE FROM kine;

DELETE 0

@github-project-automation github-project-automation bot moved this from To Test to Done Issue in K3s Development Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants