-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DQLite with a 2 node cluster #1407
Comments
Dqlite uses the Raft consensus algorithm, under which a single node in a 2 node cluster does not have quorum. Consul also uses Raft so and their docs are good so I'll link this:
You'll find similar guidance for any distributed multi-master system. You need to have an odd number of nodes, and a majority of them need to be online and participating in the cluster, in order to function. |
@brandond thanks for the response. K3S HA documentation mentions that we can have HA setup with 2 or more nodes. What would be your recommendation on how to get HA with 2 nodes (both being master + worker nodes). To provide some more context, the application that i'm working on can be a single node deployment (which is easy and done) or a 2 node deployment (which needs HA to work). We have the options of using any possible option like DQLite, etcd or anything that is not too heavy on resources. any help/recommendations here would really be appreciated. |
The k3s docs say:
Embedded dqlite for HA is still experimental, but given that it runs Raft, will always need an odd number of nodes for quorum. If you want to run exactly 2 k3s nodes, using an external database (with it's own HA mechanism) is probably your best bet. |
I will also note that you could use a lightweight 3rd node without an agent to act as nothing but a 3rd voting member in the dqlite cluster. Just deploy k3s server without the agent, or add NoSchedule taints to the node. Just because your app only wants 2 nodes doesn't mean your k3s cluster can't have more. |
Thanks @brandond - I think we're heading to the same conclusion of using an external DB like Postgres which has its on HA mechanism. Currently we were deploying postgres as a pod in our app, but looks like we'll have to externalize it and set it up in HA mode. |
Hello. What @brandond mentioned is correct #1407 (comment) dqlite uses the Raft consensus algorithm. May we close this issue or are there any remaining questions? Thanks! |
@brandond and @davidnuzik feel free to close this issue, since my question has been answered. Thanks again for quick responses. |
So interesting question then, can a three node cluster recover after being degraded to a 2 node cluster for some period of time? Or would there be issues with the quorum? |
My understanding is that a 3 node cluster can function with only 2 nodes. Might be worth reading through the dqlite docs to better understand fail over behavior. K3s doesn't expose any of the dqlite logs or metrics either, which doesn't help. |
If a 3-master cluster can't lose a master, how can that be considered HA? 🤔 It would be worse than having just 1 master, because now you have 3x chances for your cluster to go down... |
I'm under the same opinion, and am seeing this exact behavior wreck havoc on my cluster when attempting to use a HA 3 node multi master setup, especially if you tear one down. |
Started with a functional 4 node, 3 master, cluster. Rebuilt the nodes one at a time to be careful to cordon and drain nodes as I went and boom when I took down the inital node.
Still has the multi-master issue. As soon as the node that started the cluster goes down you end up with the error that a leader cannot be found.
It appears as though a mesh is not created when attaching new master nodes, master+n is always attempting to connect to the original master even if it goes away. So all the risks of single-master with more CPU usage. I'm really struggling to see the benefit, possibly just a documentation issue since you should not use the IP address of master1 when setting up new nodes, but a load balanced address.
Trying to decide between single master or etcd at this point. |
Version:
v1.17.2+k3s1
Describe the bug
We were trying to setup a 2 node cluster with DQLite. Seems like even if 1 node the k3s kubectl commands dont seem to work.
To Reproduce
Bring up a 2 node cluster and shut one of the nodes down. All k3s kubectl commands stop working. In our configuration both nodes are masters and going forward all nodes will be configured the same way (All are master + Worker nodes)
Expected behavior
Expectation is that even if 1 of the 2 nodes are down, k3s cluster should work and all kubectl commands work
Actual behavior
k3s kubectl commands dont work
Additional context
The text was updated successfully, but these errors were encountered: