Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling and availability docs with leader election advice #58

Merged
merged 18 commits into from
Apr 24, 2024

Conversation

clux
Copy link
Member

@clux clux commented Mar 18, 2024

Want to make it easier for users to make informed choices (and selfishly have something to point to for repeat questions).

For kube-rs/kube#485

mostly RENDERED

EDIT: this PR originally started out being solely about leader election, but have changed focus to scaling generally in which leader election solves a particular aspect of this.

Want to make it easies for users to find and make a choice.

Signed-off-by: clux <[email protected]>
Copy link
Contributor

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool! FWIW this looks good to me. I'm currently on the road so it's hard for me to give a thorough review but this is great :D

docs/controllers/leaderelection.md Outdated Show resolved Hide resolved
…ritism

kubert is the most expensive thing to adopt from this list in terms of deps.

Signed-off-by: clux <[email protected]>
@clux clux mentioned this pull request Mar 18, 2024
19 tasks
docs/controllers/leaderelection.md Outdated Show resolved Hide resolved
docs/controllers/leaderelection.md Outdated Show resolved Hide resolved
docs/controllers/leaderelection.md Outdated Show resolved Hide resolved
clux and others added 3 commits March 19, 2024 10:20
Co-authored-by: Natalie Klestrup Röijezon <[email protected]>
Signed-off-by: Eirik A <[email protected]>
leader-election is ultimately one optimization for scaling and it's probably
more worth highlighting other ways than tho shoehorn people into this one
solution that only solves one part of the problem.

Signed-off-by: clux <[email protected]>
@clux clux changed the title Leader election stub linking to implementations Scaling doc with links to leader election strategies Apr 20, 2024
@nightkr
Copy link
Member

nightkr commented Apr 22, 2024

IMO: leader election is a HA strategy, not a scaling mechanism (outside of enabling sharding, but that requires a pretty different LE API than "simple LE for HA"). It's still a valid thing to think about, but it's also definitely the odd one out in the list.

@clux
Copy link
Member Author

clux commented Apr 22, 2024

yeah, i agree. in my mind this is OK because a large class of controller scaling concerns arise from HA concerns (controller is too slow, too much resource pressure to do all the work fast, too slow occasionally / too slow on reschedules), and it felt nice to categorise them like this when the default recommendation is 1 replica.

do you have a different idea for where to include leader election documentation wise?

@nightkr
Copy link
Member

nightkr commented Apr 22, 2024

I'd make a separate HA page (also including stuff like "what's the default behaviour when using a single-replica deployment?" for a baseline) and link between them as appropriate (for example: "these are the downsides to blindly increasing your replica count!" without considering LE/sharding).

@nightkr
Copy link
Member

nightkr commented Apr 22, 2024

But also don't let that bikeshedding block this PR, we can always reorganize it later.

@clux
Copy link
Member Author

clux commented Apr 22, 2024

I do like that idea. It feels like it would be nicer at the very least if i had more concrete things to say about leader election and HA other than "hey, this thing can help". Perhaps as it stands it would make both docs quite skinny, but would have to try writing for a while.

For now I will raise a follow-up issue to think about it, because long-term a dedicated doc for HA feels nice once we have more concrete thoughts on LE.

@clux clux changed the title Scaling doc with links to leader election strategies Scaling doc with links to leader election crates Apr 22, 2024
Copy link
Contributor

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following along on the conversation around HA. I agree a separate page on HA would be good to have.

I do like that idea. It feels like it would be nicer at the very least if i had more concrete things to say about leader election and HA other than "hey, this thing can help". Perhaps as it stands it would make both docs quite skinny, but would have to try writing for a while.

I'm just spitballing over here, but is leader election always required for a controller? Leader election imo is valuable when you need to do some sort of write against the API Server and when it is impossible to avoid racey / undefined behaviour. There might be an opportunity here to document in which situations you'd actually need leader election to scale past 1 replica. For example:

  • A controller might not have to do any writes, instead, it might reconcile its internal state with some Kubernetes resource.
  • A controller might do a write based on a reconciliation and some internal state from an external client. In such cases, leader election might not be needed, unless the same client connects to more than one instance.

The second example is quite contrived, I'll be the first to admit it. I guess my point is that we already say:

When running the default 1 replica controller you have a de-facto leader for life. Unless you have strong latency/consistency requirements, leader election is not a required solution.

If we want to pad the docs a bit and expand on HA, we could go into when leader election would be needed, and the typical type of work controllers tend to do in a k8s cluster. It's probably a bit hard to write them though because we might not want to prescribe too much...

docs/controllers/scaling.md Outdated Show resolved Hide resolved
docs/controllers/scaling.md Outdated Show resolved Hide resolved
docs/controllers/scaling.md Show resolved Hide resolved
clux and others added 3 commits April 22, 2024 19:33
@clux
Copy link
Member Author

clux commented Apr 22, 2024

I'm just spitballing over here, but is leader election always required for a controller?

Your examples makes sense to me. To argue a little more recklessly, you could make an argument that they are not necessary at all:

  • in 1 replica case you can always use a downtime based statefulset rolling upgrade to avoid racey writes
  • in >1 replica, you can shard your resources explicitly (ns, labels) or implicitly (by using an agent pattern) to become a de-facto leader of your own domain

If you enforce such a structure, then LE is a purely tail-latency focused performance optimisation.

@clux
Copy link
Member Author

clux commented Apr 22, 2024

Have separated the doc into availability (containing an HA section and a LE section) and reduced the scaling doc. It's a little empty on the LE side, but have tried to add some more useful info.

Signed-off-by: clux <[email protected]>
Copy link
Contributor

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks a lot @clux for doing all of this work!

I feel like the availability page solves some of the concerns that were highlighted earlier. I didn't think that any section in particular was lacking substance. I also think this is likely going to be improved as more questions get asked from the community and more features get added to the project.

In terms of my review, I left some nitpicky comments (mostly from proofreading), don't think any of them are blocking.

Maybe a bit of a controversial idea but in the future we could potentially suggest useful metrics to inform some of these decisions that we talk about in scaling (e.g. how do we know how to improve our algorithm, which metrics should we look at in a normal controller, and so on). I read this article a while back and had this in the back of my head as an example.

docs/controllers/availability.md Show resolved Hide resolved

## High Availability

At a certain point, the slowdown caused by pod reschedules is going to dominate the latency metrics. Thus, having more than one replica (and having HA) is a requirement for further reducing tail latencies.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIOLI: Is it worth pointing or making explicit that HA also avoids having a single point of failure? I'll leave the judgement up to you, just driving by.

Copy link
Member Author

@clux clux Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i had totally neglected the redundancy aspect. have restructured the HA section now more focusing on why HA is different than normal apps and how. have not explicitly mentioned SPOF because it feels a bit loaded (all controllers are in a sense SPOFS of their own domain and a replica is only addressing the speed of how that can fail).

I did find a way to do a shoutout for the rollout problem though. Anyway 7cca11d

docs/controllers/scaling.md Outdated Show resolved Hide resolved
docs/controllers/scaling.md Outdated Show resolved Hide resolved
docs/controllers/scaling.md Outdated Show resolved Hide resolved
@clux
Copy link
Member Author

clux commented Apr 23, 2024

Maybe a bit of a controversial idea but in the future we could potentially suggest useful metrics to inform some of these decisions that we talk about in scaling (e.g. how do we know how to improve our algorithm, which metrics should we look at in a normal controller, and so on). I read this article a while back and had this in the back of my head as an example.

Yeah, the https://kube.rs/controllers/observability/#what-metrics (used by controller-rs) does not provide a good way to measure how much latency is incurred by controller and its scheduling/queuing system, only time taken to directly process it on the user side. This is probably worth raising an issue about on kube, because I don't see how we can measure this without having it hooked in somehow.

@clux clux changed the title Scaling doc with links to leader election crates Scaling and availability docs with leader election advice Apr 23, 2024
@clux
Copy link
Member Author

clux commented Apr 24, 2024

Merging for now. As stated, I'm sure there will be some follow-ups here and there. The docs will be online at:

Feel free to click the edit link on those docs on anything that stands out.

@clux clux merged commit 8628d19 into main Apr 24, 2024
1 check passed
@clux clux deleted the leader-election-stub branch April 24, 2024 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants