Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd join does not expose MemberList errors when node is joining the cluster #7370

Closed
brandond opened this issue Apr 27, 2023 · 1 comment
Closed
Assignees
Milestone

Comments

@brandond
Copy link
Member

brandond commented Apr 27, 2023

If the member joining the cluster cannot reach an existing cluster member on the etcd client port to list the current cluster members, it will log an error reporting Failed to get member list from etcd cluster. Will assume this member is already added. The underlying error is not reported, so it is not possible to diagnose the problem. The observed behavior suggests that the MemberList is failing both on the remote node (in the infoHandler function), and on the joining node. The underlying error is not logged in either location.

  • On RKE2, this can be reproduced by joining a new server to the cluster before etcd has started on the first server. Etcd will be unavailable because it hasn't started yet.
  • On K3s, this can be roughly reproduced by creating a two-server etcd-only cluster, then stopping one of the servers, and trying to join a 3rd. Etcd will be unavailable because the cluster is unhealthy (1 of 2 nodes available).
@brandond brandond self-assigned this Apr 27, 2023
@brandond brandond moved this from New to Working in K3s Development Apr 27, 2023
@brandond brandond added this to the v1.27.2+k3s1 milestone Apr 27, 2023
@brandond brandond changed the title Etcd join does not expose errors when Etcd join does not expose MemberList errors when node is joining the cluster Apr 27, 2023
@brandond brandond moved this from Working to To Test in K3s Development May 2, 2023
@bguzman-3pillar
Copy link

Validated on with /

$ k3s -v
k3s version v1.27.1+k3s-a736b4b1 (a736b4b1)
go version go1.20.3

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Cluster Configuration:

3 etcd-only server

Testing Steps

  1. Create etcd-only node
  2. Join second etcd-only node
  3. Stop one server
  4. Try to join 3rd etcd-only node

Validation Results:

  • k3s version used for validation:
## Server 1: 

ubuntu@ip-172-31-35-177:~$ curl -sfL https://get.k3s.io | INSTALL_K3S_COMMIT=a736b4b1b932a2f513077959a931f8f2faae6329 sh -s - server --cluster-init --token secret --disable-apiserver --disable-controller-manager --disable-scheduler
[INFO]  Using commit a736b4b1b932a2f513077959a931f8f2faae6329 as release
[INFO]  Downloading hash https://k3s-ci-builds.s3.amazonaws.com/k3s-a736b4b1b932a2f513077959a931f8f2faae6329.sha256sum
[INFO]  Downloading binary https://k3s-ci-builds.s3.amazonaws.com/k3s-a736b4b1b932a2f513077959a931f8f2faae6329
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
ubuntu@ip-172-31-35-177:~$ 
## Server 2:

ubuntu@ip-172-31-41-20:~$ curl -sfL https://get.k3s.io | INSTALL_K3S_COMMIT=a736b4b1b932a2f513077959a931f8f2faae6329 sh -s - server --cluster-init --token secret --server https://172.31.35.177:6443 --disable-apiserver --disable-controller-manager --disable-scheduler
[INFO]  Using commit a736b4b1b932a2f513077959a931f8f2faae6329 as release
[INFO]  Downloading hash https://k3s-ci-builds.s3.amazonaws.com/k3s-a736b4b1b932a2f513077959a931f8f2faae6329.sha256sum
[INFO]  Downloading binary https://k3s-ci-builds.s3.amazonaws.com/k3s-a736b4b1b932a2f513077959a931f8f2faae6329
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
ubuntu@ip-172-31-41-20:~$ 
ubuntu@ip-172-31-41-20:~$ 
ubuntu@ip-172-31-41-20:~$ sudo systemctl stop k3s
ubuntu@ip-172-31-41-20:~$ 
ubuntu@ip-172-31-41-20:~$ 
## Server 3: 
Trying to join the 3rd etcd-node-server
ubuntu@ip-172-31-41-235:~$ curl -sfL https://get.k3s.io | INSTALL_K3S_COMMIT=a736b4b1b932a2f513077959a931f8f2faae6329 sh -s - server --cluster-init --token secret --server https://172.31.35.177:6443 --disable-apiserver --disable-controller-manager --disable-scheduler
[INFO]  Using commit a736b4b1b932a2f513077959a931f8f2faae6329 as release
[INFO]  Downloading hash https://k3s-ci-builds.s3.amazonaws.com/k3s-a736b4b1b932a2f513077959a931f8f2faae6329.sha256sum
[INFO]  Downloading binary https://k3s-ci-builds.s3.amazonaws.com/k3s-a736b4b1b932a2f513077959a931f8f2faae6329
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s



  • Error handling:
.
.
.

ubuntu@ip-172-31-35-177:~$ sudo journalctl -u k3s | grep "unhealthy cluster"
May 04 20:31:29 ip-172-31-35-177 k3s[2984]: {"level":"warn","ts":"2023-05-04T20:31:29.715Z","caller":"etcdserver/server.go:1614","msg":"rejecting member add request; local member has not been connected to all peers, reconfigure breaks active quorum","local-member-id":"551ca5942950f5f5","requested-member-add":"{ID:c2e614806512614c RaftAttributes:{PeerURLs:[https://172.31.41.235:2380] IsLearner:true} Attributes:{Name: ClientURLs:[]}}","error":"etcdserver: unhealthy cluster"}

Cubuntu@ip-172-31-41-235:~sudo journalctl -u k3s | grep "unhealthy cluster"
May 04 20:31:29 ip-172-31-41-235 k3s[2204]: {"level":"warn","ts":"2023-05-04T20:31:29.715Z","logger":"etcd-client","caller":"[email protected]/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000dbc000/172.31.35.177:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: unhealthy cluster"}

ubuntu@ip-172-31-41-235:~$ journalctl -xeu k3s.service | grep "get member"
May 04 21:03:47 ip-172-31-41-235 k3s[3149]: time="2023-05-04T21:03:47Z" level=error msg="Failed to get member list from etcd cluster. Will assume this member is already added"
May 04 21:04:14 ip-172-31-41-235 k3s[3176]: time="2023-05-04T21:04:14Z" level=error msg="Failed to get member list from etcd cluster. Will assume this member is already added"
May 04 21:04:41 ip-172-31-41-235 k3s[3202]: time="2023-05-04T21:04:41Z" level=error msg="Failed to get member list from etcd cluster. Will assume this member is already added"
May 04 21:05:07 ip-172-31-41-235 k3s[3230]: time="2023-05-04T21:05:07Z" level=error msg="Failed to get member list from etcd cluster. Will assume this member is already added"

Additional context / logs:

@github-project-automation github-project-automation bot moved this from To Test to Done Issue in K3s Development May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants