-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling and availability docs with leader election advice #58
Conversation
Want to make it easies for users to find and make a choice. Signed-off-by: clux <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really cool! FWIW this looks good to me. I'm currently on the road so it's hard for me to give a thorough review but this is great :D
…ritism kubert is the most expensive thing to adopt from this list in terms of deps. Signed-off-by: clux <[email protected]>
Co-authored-by: Natalie Klestrup Röijezon <[email protected]> Signed-off-by: Eirik A <[email protected]>
Signed-off-by: clux <[email protected]>
leader-election is ultimately one optimization for scaling and it's probably more worth highlighting other ways than tho shoehorn people into this one solution that only solves one part of the problem. Signed-off-by: clux <[email protected]>
Signed-off-by: clux <[email protected]>
Signed-off-by: clux <[email protected]>
Signed-off-by: clux <[email protected]>
IMO: leader election is a HA strategy, not a scaling mechanism (outside of enabling sharding, but that requires a pretty different LE API than "simple LE for HA"). It's still a valid thing to think about, but it's also definitely the odd one out in the list. |
yeah, i agree. in my mind this is OK because a large class of controller scaling concerns arise from HA concerns (controller is too slow, too much resource pressure to do all the work fast, too slow occasionally / too slow on reschedules), and it felt nice to categorise them like this when the default recommendation is 1 replica. do you have a different idea for where to include leader election documentation wise? |
I'd make a separate HA page (also including stuff like "what's the default behaviour when using a single-replica deployment?" for a baseline) and link between them as appropriate (for example: "these are the downsides to blindly increasing your replica count!" without considering LE/sharding). |
But also don't let that bikeshedding block this PR, we can always reorganize it later. |
I do like that idea. It feels like it would be nicer at the very least if i had more concrete things to say about leader election and HA other than "hey, this thing can help". Perhaps as it stands it would make both docs quite skinny, but would have to try writing for a while. For now I will raise a follow-up issue to think about it, because long-term a dedicated doc for HA feels nice once we have more concrete thoughts on LE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following along on the conversation around HA. I agree a separate page on HA would be good to have.
I do like that idea. It feels like it would be nicer at the very least if i had more concrete things to say about leader election and HA other than "hey, this thing can help". Perhaps as it stands it would make both docs quite skinny, but would have to try writing for a while.
I'm just spitballing over here, but is leader election always required for a controller? Leader election imo is valuable when you need to do some sort of write against the API Server and when it is impossible to avoid racey / undefined behaviour. There might be an opportunity here to document in which situations you'd actually need leader election to scale past 1 replica. For example:
- A controller might not have to do any writes, instead, it might reconcile its internal state with some Kubernetes resource.
- A controller might do a write based on a reconciliation and some internal state from an external client. In such cases, leader election might not be needed, unless the same client connects to more than one instance.
The second example is quite contrived, I'll be the first to admit it. I guess my point is that we already say:
When running the default 1 replica controller you have a de-facto
leader for life
. Unless you have strong latency/consistency requirements, leader election is not a required solution.
If we want to pad the docs a bit and expand on HA, we could go into when leader election would be needed, and the typical type of work controllers tend to do in a k8s cluster. It's probably a bit hard to write them though because we might not want to prescribe too much...
Co-authored-by: Matei David <[email protected]> Signed-off-by: Eirik A <[email protected]>
Co-authored-by: Matei David <[email protected]> Signed-off-by: Eirik A <[email protected]>
Signed-off-by: clux <[email protected]>
Your examples makes sense to me. To argue a little more recklessly, you could make an argument that they are not necessary at all:
If you enforce such a structure, then LE is a purely tail-latency focused performance optimisation. |
Signed-off-by: clux <[email protected]>
Have separated the doc into |
Signed-off-by: clux <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thanks a lot @clux for doing all of this work!
I feel like the availability page solves some of the concerns that were highlighted earlier. I didn't think that any section in particular was lacking substance. I also think this is likely going to be improved as more questions get asked from the community and more features get added to the project.
In terms of my review, I left some nitpicky comments (mostly from proofreading), don't think any of them are blocking.
Maybe a bit of a controversial idea but in the future we could potentially suggest useful metrics to inform some of these decisions that we talk about in scaling (e.g. how do we know how to improve our algorithm, which metrics should we look at in a normal controller, and so on). I read this article a while back and had this in the back of my head as an example.
docs/controllers/availability.md
Outdated
|
||
## High Availability | ||
|
||
At a certain point, the slowdown caused by pod reschedules is going to dominate the latency metrics. Thus, having more than one replica (and having HA) is a requirement for further reducing tail latencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIOLI: Is it worth pointing or making explicit that HA also avoids having a single point of failure? I'll leave the judgement up to you, just driving by.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i had totally neglected the redundancy aspect. have restructured the HA section now more focusing on why HA is different than normal apps and how. have not explicitly mentioned SPOF because it feels a bit loaded (all controllers are in a sense SPOFS of their own domain and a replica is only addressing the speed of how that can fail).
I did find a way to do a shoutout for the rollout problem though. Anyway 7cca11d
Signed-off-by: clux <[email protected]>
…issue Signed-off-by: clux <[email protected]>
Yeah, the https://kube.rs/controllers/observability/#what-metrics (used by controller-rs) does not provide a good way to measure how much latency is incurred by controller and its scheduling/queuing system, only time taken to directly process it on the user side. This is probably worth raising an issue about on kube, because I don't see how we can measure this without having it hooked in somehow. |
Signed-off-by: clux <[email protected]>
so have made the callout a little more poignant Signed-off-by: clux <[email protected]>
Signed-off-by: clux <[email protected]>
Merging for now. As stated, I'm sure there will be some follow-ups here and there. The docs will be online at: Feel free to click the edit link on those docs on anything that stands out. |
Want to make it easier for users to make informed choices (and selfishly have something to point to for repeat questions).
For kube-rs/kube#485
mostly RENDERED
EDIT: this PR originally started out being solely about leader election, but have changed focus to
scaling
generally in which leader election solves a particular aspect of this.