Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add embedded etcd support #1770

Merged
merged 7 commits into from
Jun 7, 2020
Merged

Conversation

ibuildthecloud
Copy link
Contributor

This PR will swap dqlite for etcd for the embedded HA option. sqlite is the default storage option. The UX for using etcd is the exact same as dqlite. This means to enable etcd you must run one server with server --cluster-init and then join other servers with server -s URL -t token.

This PR needs a lot more testing and some bumpy edges rounded out still.

@leolb-aphp
Copy link

This looks good, I feel like this is a more dependable solution in the context of Kubernetes because the interaction between etcd and Kubernetes is also tested upstream, unlike dqlite :-)

@aarononeal
Copy link

What's the migration path to convert a dqlite to an etcd cluster?

@brandond
Copy link
Member

@aarononeal there are not currently any tools to perform datastore migration. You're basically rebuilding the cluster and redeploying your manifests.

@leolb-aphp
Copy link

@aarononeal I don't think anyone was running a dqlite cluster for anything serious anyway, it didnt work so well and didnt give high availability due to the implementation being experimental and having some shortcomings :)

@remkolems
Copy link

@aarononeal I don't think anyone was running a dqlite cluster for anything serious anyway, it didnt work so well and didnt give high availability due to the implementation being experimental and having some shortcomings :)

Well I was about to... I know, can't wait forever... and that along with the missing migration part... so just in time.

Is there a (rough) timeline available for the embedded etcd support? If I can assist in testing, count me in.

@aarononeal
Copy link

Dqlite aside, is there a K3S migration path from single master to HA cluster and back or does that kind of move also require rebuilding the cluster?

I could imagine folks starting with a single master and wanting to move to HA later without such pains.

Could have also made for a good migration path here because then I could have dumped dqlite back to sqlite before moving forward to etcd again.

@brandond
Copy link
Member

brandond commented May 19, 2020

@aarononeal you can go from single master to multi-master and back again with any external datastore. Using the built-in sqlite datastore limits you to only a single master. The biggest limitation is just that there's no good way to migrate between datastores.

@aarononeal
Copy link

Since I have to move from embedded to external, I was considering using lxd and juju as a fast way to deploy etcd.

Is that a bad idea given lxd clustering relies on dqlite? I'm not up to speed on the problems.

Eventually I would prefer to stick with the embedded k3s option to avoid the added lxd and operational dependencies.

@brandond
Copy link
Member

dqlite comes from lxd, but it appears that they don't have the same issues with master transitions as kine. I haven't played with juju, I just run etcd as a systemd unit on each of my nodes alongside k3s.

@wilmardo
Copy link

Since #1760 is closed I am including @brandond his concerns here

My biggest surprise when I switched from dqlite to etcd is how demanding etcd is of low-latency disk. For my tiny 3-node cluster that's basically doing nothing (7 etcd events/sec) etcd issues 40 write ops/sec and 15 fsync/sec. It also expects these to reliably execute in less than 10ms, or nodes start getting evicted from the cluster due to election timeouts. 7200 RPM rotational disks could not reliably deliver this; I had to put the etcd database and journal on ramdisk until I was able to get some additional SSD to throw at it.
#1760 (comment)

This could be a major turnoff for people running SD cards or eMMC flash since these notorious for failing due to excess write operations. A parameter to run ETCD in memory might be an option for devices with 4GB of RAM?
Also referencing this discussion over at kubernetes-sigs/kind#845

@brandond
Copy link
Member

brandond commented May 26, 2020

Yeah, etcd is crazy demanding of low IO latency, and issues about 14 fsyncs/second on my 3 node cluster with basically nothing going on. Definitely not as easy to run on low-end hardware as kine with sqlite.

I ran ercd for a while on tmpfs while waiting for some more SSDs to come in. It worked OK but took a lot of memory since it doesn't purge the wal very agressively.

@brandond
Copy link
Member

Possibly relevant recent change to etcd:
etcd-io/etcd#11946

@ibuildthecloud ibuildthecloud marked this pull request as ready for review June 6, 2020 23:28
@ibuildthecloud
Copy link
Contributor Author

We are going to go ahead with merging this PR but there still a lot of testing to do, handling upgrade, improving cluster bootstrap procedure. We are very concerned about the I/O demands of etcd and will continue to look at that. Right now dqlite is broken (due to our integration, not dqlite itself) and not really usable, so moving to etcd at least gets us to a functioning embedded option. sqlite or mysql is still a much better option for less resource usage.

This work is targeted to be included in k3s 1.19 (which should be released first half of August according to the k8s 1.19 release schedule).

@@ -389,6 +389,8 @@ func (e *ETCD) cluster(ctx context.Context, forceNew bool, options executor.Init
ClientCertAuth: true,
TrustedCAFile: e.config.Runtime.ETCDPeerCA,
},
ElectionTimeout: 5000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjuste to customizable parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address the customization of etcd in a follow up PR. We will probably need to do an approach where you can specific the etcd conf file and we merge it, similar to containerd config. There's to many params to add to the CLI to address everything.

go.mod Outdated Show resolved Hide resolved
@srdjan
Copy link

srdjan commented Jun 7, 2020

just wondering... was rqlite ever considered ?

@ibuildthecloud
Copy link
Contributor Author

@srdjan rqlite was considered but we preferred dqlite at the time. The exact reasoning I don't remember. At this point it doesn't really matter because the key take away is that it's infeasible for the core k3s team to maintain alternative raft based systems. We are switching to etcd primarily for the fact it's well tested and known. Not really because it's technically superior or even liked. 😀

@ibuildthecloud ibuildthecloud merged commit fe73379 into k3s-io:master Jun 7, 2020
@remkolems
Copy link

We are going to go ahead with merging this PR but there still a lot of testing to do, handling upgrade, improving cluster bootstrap procedure. We are very concerned about the I/O demands of etcd and will continue to look at that. Right now dqlite is broken (due to our integration, not dqlite itself) and not really usable, so moving to etcd at least gets us to a functioning embedded option. sqlite or mysql is still a much better option for less resource usage.

This work is targeted to be included in k3s 1.19 (which should be released first half of August according to the k8s 1.19 release schedule).

Thank you for the timeline!

Quick question (don't know if this is an obvious one or even a rhetoric one, but):

For the time being, why can't we use an etcd docker/container - that forms a dedicated etcd cluster - on each of the k3s master nodes which than kind of mimic/resemble an "embedded" etcd support?

In my reasoning we don't have to buy/administer/etcetera dedicated separate additional hardware to run an etcd cluster.

@nustiueudinastea
Copy link

@remkolems I think having a separate etcd cluster, even if running on the same nodes as the K8s master, would introduce considerable operation complexity which goes against the goals of k3s. I think embedding etcd in k3s is a great idea, as long as etcd can be tuned to work on lower powered devices that don't have the fastest storage.

@stevefan1999-personal
Copy link

This is going to be a kubeadm killer...yet you will also have to pay the price of getting a bigger binary...

@jamesorlakin
Copy link

We'll have to wait and see what the binary impact is, but I imagine it won't be a deal breaker in reality. Many container images are fairly sizeable in their own right.

At a pinch, I imagine there might be a way to tweak the builds scripts to exclude it. (Go build tags maybe?)

@cawoodm
Copy link

cawoodm commented Apr 10, 2022

Why was dqlite dropped in k3s? microk8s is still "happily" using it and claiming HA.

@onedr0p
Copy link
Contributor

onedr0p commented Apr 10, 2022

dqlite and microk8s are both made and maintained by Canonical, it makes sense they want to dog food it. Besides that when k3s was using dqlite it was very unstable for me, what's wrong with using plain ol' reliable etcd?

@stevefan1999-personal
Copy link

@cawoodm dqlite support, or actually a simple wrapper of dqlite for the Kine sqlite backend, is pretty shitty to say the least. It does not handle most of the etcd operations, no user support, no proper mvvc support and only supports essential sync based CRUD operations. The database support is real bad for a real cluster. No wonder it was switched away from dqlite and use embedded etcd instead. Considering this would you still use microk8s as your main cluster?

Spoiler alert: I worked on a PR for switching Kine to use a GORM backend. And I found out the egregious details. By the way, one can actually easily implement etcd because it is gRPC-based. I have a simple Rust implementation of Kine that I'm hoping to replace the current shit we have today.

@cawoodm
Copy link

cawoodm commented Apr 10, 2022

Adrian Goins mentioned in his HA video that dqlite was not reliable but no details. Then he proceeds to create a HA K3S cluster but it's a very complex process. We're trying to decide whether K3S is the way to go for HA on-prem Kubernetes. Seemingly better support by Canonical had us leaning towards microk8s until we heard vague doubts about dqlite.

@onedr0p
Copy link
Contributor

onedr0p commented Apr 10, 2022

The process is really not hard to use k3s and etcd, in fact etcd is the default now in k3s so there's no special configuration needed unless you need to tweak etcd settings.

@aarononeal
Copy link

I too found dqlite incredibly unstable. Botched syncing took down 4 different clusters during node reboots. It wasn't stable for production.

@brandond
Copy link
Member

Start one node with --cluster-init. Start more nodes (either server or agent) with --server pointed at the first node. Not sure how that's complex?

@cawoodm
Copy link

cawoodm commented Apr 10, 2022

The process is really not hard to use k3s and etcd, in fact etcd is the default now in k3s so there's no special configuration needed unless you need to tweak etcd settings.

I'm trying to setup 2-node HA with external postgres. Several days of trying have not been successful.
#5406

@onedr0p
Copy link
Contributor

onedr0p commented Apr 10, 2022

That doesn't have anything to do with etcd or dqlite nor you are giving enough information to help debug the issue. I suggest you open an issue with more information

@brandond
Copy link
Member

Yeah I'm not sure what your problem has to do with etcd since you're not using etcd. I'm going to lock this conversation; anyone who is having problems (with etcd or otherwise) should open an issue instead of commenting on this PR.

@k3s-io k3s-io locked as off-topic and limited conversation to collaborators Apr 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.