Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Delete dqlite #1760

Closed
wants to merge 1 commit into from
Closed

Conversation

ibuildthecloud
Copy link
Contributor

@ibuildthecloud ibuildthecloud commented May 5, 2020

dqlite is still supported through kine, but we are deleting the native integration of dqlite into k3s. We are instead going to switch to etcd for the embedded HA option. Being that in general etcd is well supported in the k8s ecosystem it seems only natural to use etcd. A follow PR is coming shortly to add etcd support.

@ibuildthecloud ibuildthecloud changed the title Delete dqlite WIP: Delete dqlite May 5, 2020
@ibuildthecloud
Copy link
Contributor Author

There's no point in merging this until we have the etcd replacement. But I'm just separating this out to keep the PRs cleaner and smaller for this major change.

@brandond
Copy link
Member

brandond commented May 6, 2020

I'm super curious to see how this pans out.

My biggest surprise when I switched from dqlite to etcd is how demanding etcd is of low-latency disk. For my tiny 3-node cluster that's basically doing nothing (7 etcd events/sec) etcd issues 40 write ops/sec and 15 fsync/sec. It also expects these to reliably execute in less than 10ms, or nodes start getting evicted from the cluster due to election timeouts. 7200 RPM rotational disks could not reliably deliver this; I had to put the etcd database and journal on ramdisk until I was able to get some additional SSD to throw at it.

@ibuildthecloud
Copy link
Contributor Author

@brandond You bring up some excellent points here. Fundamentally dqlite and etcd are the same beast as they are both raft. But dqlite has the main advantage in that it's using sqlite under the hood and not boltdb. This is one of the reason we picked it over etcd to start with. We can tune the some of the timeouts and heartbeat intervals for etcd, but the the iops and fsync could be an issue. I'm curious if you have some actual stats of iops/fsync for dqlite vs etcd running the same workload. This would be a show stopping issue if we can't make this more tolerable to slow disks.

@ibuildthecloud
Copy link
Contributor Author

Included in #1770

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants