-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add embedded etcd support #1770
Conversation
4c30710
to
d560cca
Compare
This looks good, I feel like this is a more dependable solution in the context of Kubernetes because the interaction between etcd and Kubernetes is also tested upstream, unlike dqlite :-) |
What's the migration path to convert a dqlite to an etcd cluster? |
@aarononeal there are not currently any tools to perform datastore migration. You're basically rebuilding the cluster and redeploying your manifests. |
@aarononeal I don't think anyone was running a dqlite cluster for anything serious anyway, it didnt work so well and didnt give high availability due to the implementation being experimental and having some shortcomings :) |
Well I was about to... I know, can't wait forever... and that along with the missing migration part... so just in time. Is there a (rough) timeline available for the embedded etcd support? If I can assist in testing, count me in. |
Dqlite aside, is there a K3S migration path from single master to HA cluster and back or does that kind of move also require rebuilding the cluster? I could imagine folks starting with a single master and wanting to move to HA later without such pains. Could have also made for a good migration path here because then I could have dumped dqlite back to sqlite before moving forward to etcd again. |
@aarononeal you can go from single master to multi-master and back again with any external datastore. Using the built-in sqlite datastore limits you to only a single master. The biggest limitation is just that there's no good way to migrate between datastores. |
Since I have to move from embedded to external, I was considering using lxd and juju as a fast way to deploy etcd. Is that a bad idea given lxd clustering relies on dqlite? I'm not up to speed on the problems. Eventually I would prefer to stick with the embedded k3s option to avoid the added lxd and operational dependencies. |
dqlite comes from lxd, but it appears that they don't have the same issues with master transitions as kine. I haven't played with juju, I just run etcd as a systemd unit on each of my nodes alongside k3s. |
Since #1760 is closed I am including @brandond his concerns here
This could be a major turnoff for people running SD cards or eMMC flash since these notorious for failing due to excess write operations. A parameter to run ETCD in memory might be an option for devices with 4GB of RAM? |
Yeah, etcd is crazy demanding of low IO latency, and issues about 14 fsyncs/second on my 3 node cluster with basically nothing going on. Definitely not as easy to run on low-end hardware as kine with sqlite. I ran ercd for a while on tmpfs while waiting for some more SSDs to come in. It worked OK but took a lot of memory since it doesn't purge the wal very agressively. |
Possibly relevant recent change to etcd: |
This is replaces dqlite with etcd. The each same UX of dqlite is followed so there is no change to the CLI args for this.
We are going to go ahead with merging this PR but there still a lot of testing to do, handling upgrade, improving cluster bootstrap procedure. We are very concerned about the I/O demands of etcd and will continue to look at that. Right now dqlite is broken (due to our integration, not dqlite itself) and not really usable, so moving to etcd at least gets us to a functioning embedded option. sqlite or mysql is still a much better option for less resource usage. This work is targeted to be included in k3s 1.19 (which should be released first half of August according to the k8s 1.19 release schedule). |
@@ -389,6 +389,8 @@ func (e *ETCD) cluster(ctx context.Context, forceNew bool, options executor.Init | |||
ClientCertAuth: true, | |||
TrustedCAFile: e.config.Runtime.ETCDPeerCA, | |||
}, | |||
ElectionTimeout: 5000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjuste to customizable parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll address the customization of etcd in a follow up PR. We will probably need to do an approach where you can specific the etcd conf file and we merge it, similar to containerd config. There's to many params to add to the CLI to address everything.
just wondering... was rqlite ever considered ? |
@srdjan rqlite was considered but we preferred dqlite at the time. The exact reasoning I don't remember. At this point it doesn't really matter because the key take away is that it's infeasible for the core k3s team to maintain alternative raft based systems. We are switching to etcd primarily for the fact it's well tested and known. Not really because it's technically superior or even liked. 😀 |
Thank you for the timeline! Quick question (don't know if this is an obvious one or even a rhetoric one, but): For the time being, why can't we use an etcd docker/container - that forms a dedicated etcd cluster - on each of the k3s master nodes which than kind of mimic/resemble an "embedded" etcd support? In my reasoning we don't have to buy/administer/etcetera dedicated separate additional hardware to run an etcd cluster. |
@remkolems I think having a separate etcd cluster, even if running on the same nodes as the K8s master, would introduce considerable operation complexity which goes against the goals of k3s. I think embedding etcd in k3s is a great idea, as long as etcd can be tuned to work on lower powered devices that don't have the fastest storage. |
This is going to be a kubeadm killer...yet you will also have to pay the price of getting a bigger binary... |
We'll have to wait and see what the binary impact is, but I imagine it won't be a deal breaker in reality. Many container images are fairly sizeable in their own right. At a pinch, I imagine there might be a way to tweak the builds scripts to exclude it. (Go build tags maybe?) |
Why was dqlite dropped in k3s? microk8s is still "happily" using it and claiming HA. |
dqlite and microk8s are both made and maintained by Canonical, it makes sense they want to dog food it. Besides that when k3s was using dqlite it was very unstable for me, what's wrong with using plain ol' reliable etcd? |
@cawoodm dqlite support, or actually a simple wrapper of dqlite for the Kine sqlite backend, is pretty shitty to say the least. It does not handle most of the etcd operations, no user support, no proper mvvc support and only supports essential sync based CRUD operations. The database support is real bad for a real cluster. No wonder it was switched away from dqlite and use embedded etcd instead. Considering this would you still use microk8s as your main cluster? Spoiler alert: I worked on a PR for switching Kine to use a GORM backend. And I found out the egregious details. By the way, one can actually easily implement etcd because it is gRPC-based. I have a simple Rust implementation of Kine that I'm hoping to replace the current shit we have today. |
Adrian Goins mentioned in his HA video that dqlite was not reliable but no details. Then he proceeds to create a HA K3S cluster but it's a very complex process. We're trying to decide whether K3S is the way to go for HA on-prem Kubernetes. Seemingly better support by Canonical had us leaning towards microk8s until we heard vague doubts about dqlite. |
The process is really not hard to use k3s and etcd, in fact etcd is the default now in k3s so there's no special configuration needed unless you need to tweak etcd settings. |
I too found |
Start one node with --cluster-init. Start more nodes (either server or agent) with --server pointed at the first node. Not sure how that's complex? |
I'm trying to setup 2-node HA with external postgres. Several days of trying have not been successful. |
That doesn't have anything to do with etcd or dqlite |
Yeah I'm not sure what your problem has to do with etcd since you're not using etcd. I'm going to lock this conversation; anyone who is having problems (with etcd or otherwise) should open an issue instead of commenting on this PR. |
This PR will swap dqlite for etcd for the embedded HA option. sqlite is the default storage option. The UX for using etcd is the exact same as dqlite. This means to enable etcd you must run one server with
server --cluster-init
and then join other servers withserver -s URL -t token
.This PR needs a lot more testing and some bumpy edges rounded out still.