-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Raft Latency on Different Infrastructures #578
Comments
@aminst thanks for the great report! We'd love to help figure out what you are seeing. I have a few questions:
You might be interested in this issue we fixed recently that could artificially inflate commit latencies when throughput was high in some situations by up to 128x the actual disk write time. but if you see this issue even on single writes on un-loaded cluster then it's not likely to be this. |
Hi @banks, Thanks for your response.
Thank you again for your help! |
Hi @banks |
@aminst glad we could help, other folks here also reviewed and contributed to that response so I can't take all the credit! In general fsyncs are the slowest thing in raft (or database that actually has strong durability guarantees). The main way to improve throughput and lower latency is to provide many parallel writes - this library will "group commit" that is batch all writes made in parallel to disk so you can get up to 64 (by default) writes for the price of one! It's also the reason why raft (and most consensus-based systems) are not very effective at utilizing disk hardware. Modern SSDs have huge IOps available but only if your workload can issue lots (usually thousands) of larger reads or writes in parallel to take advantage of the parallel chips/controllers/channels etc. in the device. There is one optimization i'm considering for this library that could help reduce latency: writing to the leader's disk in parallel with replicating to the followers. It's in Diego's raft thesis (section 10.2.1) and I have a pretty good idea and a prototype of how to make it happen in this library but not sure if/when we'll get to it. That means that instead of needing to wait for leaders disk and then the RTT and follower disk write before committing, we effectively only wait for the RTT and one disk write as they are all in parallel with the leader's disk write. So if your disk writes take 5ms and RTT is 0.1ms today a commit will take at least 5 + 0.1 + 5 = 10.1ms wheras in theory it could be done in just 0.1 + 5ms = 5.1ms. I'm going to close this as it sounds like you worked out what is going on but feel free to let us know if you have more feedback or info on this! |
Hi @banks |
I love you enthusiasm to contribute, thank you! To be very honest, we've not yet found a great model for working on really major contributions from the community for Raft. The issue is that Raft is very easy to break and very hard to have confidence in either the completeness of tests or the thoroughness with which we've understood changes. That means even internally it takes a large "cost/benefit" discussion and to motivate taking the risk of making a significant change that will impact all our product's core reliability! Trying to coordinate all that around community contributions is something we've not yet achieved. I'd love to find some ways to be more open to community input though. In this case I think the best next step is for me to write up a GH issue with my current thoughts and experiments around this. one really useful contribution to that effort would be having others testing and validating those changes or helping us think of ways we can reduce the risk etc! I'll try to do that next week. In general, it's usually best to open an issue if you have an idea for something that might help so we can think through together the motivation and design before you invest too much work in it and find we're not able to justify the risk or time investment to build enough confidence in the correctness and benefit to get it merged. |
Thank you for your encouraging words. |
@aminst I've opened #579 as a WIP experiment with my current thinking on how we can improve this. It will likely remain WIP until we've done a bunch more testing to check that the performance improvement is actually enough to warrant pursuing it futher, and then we'll need to work out how to convince ourselves it's correct in and thoroughly tested! |
Issue Summary
I've encountered an issue with HashiCorp's Raft library where I observe inconsistent Raft latencies across different infrastructures. This issue seems specific to HashiCorp Raft, as I've successfully used Etcd on the same infrastructure without encountering similar problems.
Description
I have extensively tested the HashiCorp Raft library on three different infrastructures: my local machine, physical servers, and GCP (Google Cloud Platform). While I don't encounter issues with other communications, I'm experiencing significant differences in Raft replication latencies.
This inconsistency in replication latency is puzzling, as I would expect more uniform behavior across different infrastructures, especially when using the same Raft library.
Environment
Steps to Reproduce
Use the Raft Example and test the replication latency.
The text was updated successfully, but these errors were encountered: