Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation test for cluster-reset with joining existing nodes back #6060

Closed
Tracked by #6070
rancher-max opened this issue Aug 30, 2022 · 2 comments
Closed
Tracked by #6070
Assignees
Labels
kind/task Work not related to bug fixes or new functionality kind/test
Milestone

Comments

@rancher-max
Copy link
Contributor

We currently have integration tests for cluster-reset with restore-path. We should also have a validation test for just a baseline cluster-reset command. This test should have at least 3 server nodes, and join the other two server nodes back to the cluster after running the cluster-reset command and initializing the new single-node cluster. The full steps, when done manually, are:

  1. Create an HA cluster, make sure it's up and ready
  2. Deploy some workloads.
  3. Stop two server nodes by using "sudo k3s-killall.sh"
  4. Make sure the cluster is not accessible now
  5. Shut down the k3s server on that remaining node by running "sudo systemctl stop k3s"
  6. Run "sudo k3s server --cluster-reset"
  7. After the command completes, restart the k3s server process: "sudo systemctl start k3s"
  8. Check to get it working again and after a while the other nodes will be in NotReady state. Usually have to wait a few minutes for this to occur.
  9. Remove the db directories from other servers by running "sudo rm -rf /var/lib/rancher/k3s/server/db"
  10. Restart the k3s server process on the other servers by running "sudo systemctl start k3s". It's best to do this one node at a time, otherwise an error will occur about "too many learner members in cluster". This is usually OK as it will reconcile itself, but doing one node at a time should avoid that altogether.
  11. Run kubectl commands and deploy workloads on all nodes to validate everything is up and ready and all nodes are still part of the same cluster.
@rancher-max rancher-max added kind/task Work not related to bug fixes or new functionality kind/test labels Aug 30, 2022
@cwayne18 cwayne18 added this to the v1.24.5+k3s1 milestone Aug 30, 2022
@brooksn brooksn self-assigned this Sep 1, 2022
@brooksn brooksn removed their assignment Sep 20, 2022
@ShylajaDevadiga ShylajaDevadiga self-assigned this Sep 20, 2022
@VestigeJ
Copy link

We need to append the process to perform the cluster-reset from secondary server nodes (ie any node that wasn't the target node on cluster creation).

@rancher-max
Copy link
Contributor Author

I'm going to close this as the test has been added so there's nothing to officially test here. See linked issue above that will be worked to address comments here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/task Work not related to bug fixes or new functionality kind/test
Projects
None yet
Development

No branches or pull requests

5 participants