Scaling and availability docs with leader election advice #58

clux · 2024-03-18T19:10:57Z

Want to make it easier for users to make informed choices (and selfishly have something to point to for repeat questions).

For kube-rs/kube#485

mostly RENDERED

EDIT: this PR originally started out being solely about leader election, but have changed focus to scaling generally in which leader election solves a particular aspect of this.

Want to make it easies for users to find and make a choice. Signed-off-by: clux <[email protected]>

mateiidavid

Really cool! FWIW this looks good to me. I'm currently on the road so it's hard for me to give a thorough review but this is great :D

docs/controllers/leaderelection.md

…ritism kubert is the most expensive thing to adopt from this list in terms of deps. Signed-off-by: clux <[email protected]>

docs/controllers/leaderelection.md

Co-authored-by: Natalie Klestrup Röijezon <[email protected]> Signed-off-by: Eirik A <[email protected]>

Signed-off-by: clux <[email protected]>

leader-election is ultimately one optimization for scaling and it's probably more worth highlighting other ways than tho shoehorn people into this one solution that only solves one part of the problem. Signed-off-by: clux <[email protected]>

Signed-off-by: clux <[email protected]>

nightkr · 2024-04-22T10:46:43Z

IMO: leader election is a HA strategy, not a scaling mechanism (outside of enabling sharding, but that requires a pretty different LE API than "simple LE for HA"). It's still a valid thing to think about, but it's also definitely the odd one out in the list.

clux · 2024-04-22T10:56:53Z

yeah, i agree. in my mind this is OK because a large class of controller scaling concerns arise from HA concerns (controller is too slow, too much resource pressure to do all the work fast, too slow occasionally / too slow on reschedules), and it felt nice to categorise them like this when the default recommendation is 1 replica.

do you have a different idea for where to include leader election documentation wise?

nightkr · 2024-04-22T11:00:08Z

I'd make a separate HA page (also including stuff like "what's the default behaviour when using a single-replica deployment?" for a baseline) and link between them as appropriate (for example: "these are the downsides to blindly increasing your replica count!" without considering LE/sharding).

nightkr · 2024-04-22T11:05:23Z

But also don't let that bikeshedding block this PR, we can always reorganize it later.

clux · 2024-04-22T11:12:04Z

I do like that idea. It feels like it would be nicer at the very least if i had more concrete things to say about leader election and HA other than "hey, this thing can help". Perhaps as it stands it would make both docs quite skinny, but would have to try writing for a while.

For now I will raise a follow-up issue to think about it, because long-term a dedicated doc for HA feels nice once we have more concrete thoughts on LE.

mateiidavid

Following along on the conversation around HA. I agree a separate page on HA would be good to have.

I do like that idea. It feels like it would be nicer at the very least if i had more concrete things to say about leader election and HA other than "hey, this thing can help". Perhaps as it stands it would make both docs quite skinny, but would have to try writing for a while.

I'm just spitballing over here, but is leader election always required for a controller? Leader election imo is valuable when you need to do some sort of write against the API Server and when it is impossible to avoid racey / undefined behaviour. There might be an opportunity here to document in which situations you'd actually need leader election to scale past 1 replica. For example:

A controller might not have to do any writes, instead, it might reconcile its internal state with some Kubernetes resource.
A controller might do a write based on a reconciliation and some internal state from an external client. In such cases, leader election might not be needed, unless the same client connects to more than one instance.

The second example is quite contrived, I'll be the first to admit it. I guess my point is that we already say:

When running the default 1 replica controller you have a de-facto leader for life. Unless you have strong latency/consistency requirements, leader election is not a required solution.

If we want to pad the docs a bit and expand on HA, we could go into when leader election would be needed, and the typical type of work controllers tend to do in a k8s cluster. It's probably a bit hard to write them though because we might not want to prescribe too much...

docs/controllers/scaling.md

Co-authored-by: Matei David <[email protected]> Signed-off-by: Eirik A <[email protected]>

Signed-off-by: clux <[email protected]>

clux · 2024-04-22T20:03:08Z

I'm just spitballing over here, but is leader election always required for a controller?

Your examples makes sense to me. To argue a little more recklessly, you could make an argument that they are not necessary at all:

in 1 replica case you can always use a downtime based statefulset rolling upgrade to avoid racey writes
in >1 replica, you can shard your resources explicitly (ns, labels) or implicitly (by using an agent pattern) to become a de-facto leader of your own domain

If you enforce such a structure, then LE is a purely tail-latency focused performance optimisation.

Signed-off-by: clux <[email protected]>

clux · 2024-04-22T23:46:52Z

Have separated the doc into availability (containing an HA section and a LE section) and reduced the scaling doc. It's a little empty on the LE side, but have tried to add some more useful info.

Signed-off-by: clux <[email protected]>

mateiidavid

This looks good to me, thanks a lot @clux for doing all of this work!

I feel like the availability page solves some of the concerns that were highlighted earlier. I didn't think that any section in particular was lacking substance. I also think this is likely going to be improved as more questions get asked from the community and more features get added to the project.

In terms of my review, I left some nitpicky comments (mostly from proofreading), don't think any of them are blocking.

Maybe a bit of a controversial idea but in the future we could potentially suggest useful metrics to inform some of these decisions that we talk about in scaling (e.g. how do we know how to improve our algorithm, which metrics should we look at in a normal controller, and so on). I read this article a while back and had this in the back of my head as an example.

docs/controllers/availability.md

mateiidavid · 2024-04-23T18:14:13Z

docs/controllers/availability.md

+
+## High Availability
+
+At a certain point, the slowdown caused by pod reschedules is going to dominate the latency metrics. Thus, having more than one replica (and having HA) is a requirement for further reducing tail latencies.


TIOLI: Is it worth pointing or making explicit that HA also avoids having a single point of failure? I'll leave the judgement up to you, just driving by.

yeah, i had totally neglected the redundancy aspect. have restructured the HA section now more focusing on why HA is different than normal apps and how. have not explicitly mentioned SPOF because it feels a bit loaded (all controllers are in a sense SPOFS of their own domain and a replica is only addressing the speed of how that can fail).

I did find a way to do a shoutout for the rollout problem though. Anyway 7cca11d

docs/controllers/scaling.md

Signed-off-by: clux <[email protected]>

…issue Signed-off-by: clux <[email protected]>

clux · 2024-04-23T20:33:45Z

Maybe a bit of a controversial idea but in the future we could potentially suggest useful metrics to inform some of these decisions that we talk about in scaling (e.g. how do we know how to improve our algorithm, which metrics should we look at in a normal controller, and so on). I read this article a while back and had this in the back of my head as an example.

Yeah, the https://kube.rs/controllers/observability/#what-metrics (used by controller-rs) does not provide a good way to measure how much latency is incurred by controller and its scheduling/queuing system, only time taken to directly process it on the user side. This is probably worth raising an issue about on kube, because I don't see how we can measure this without having it hooked in somehow.

Signed-off-by: clux <[email protected]>

so have made the callout a little more poignant Signed-off-by: clux <[email protected]>

Signed-off-by: clux <[email protected]>

clux · 2024-04-24T10:45:19Z

Merging for now. As stated, I'm sure there will be some follow-ups here and there. The docs will be online at:

Feel free to click the edit link on those docs on anything that stands out.

Leader election stub linking to implementations

51645f1

Want to make it easies for users to find and make a choice. Signed-off-by: clux <[email protected]>

clux requested review from nightkr and mateiidavid March 18, 2024 19:15

mateiidavid approved these changes Mar 18, 2024

View reviewed changes

docs/controllers/leaderelection.md Outdated Show resolved Hide resolved

link to linkerd source impl, but move it to the bottom to avoid favou…

bb71701

…ritism kubert is the most expensive thing to adopt from this list in terms of deps. Signed-off-by: clux <[email protected]>

clux mentioned this pull request Mar 18, 2024

write a controller guide #5

Open

19 tasks

nightkr reviewed Mar 19, 2024

View reviewed changes

docs/controllers/leaderelection.md Outdated Show resolved Hide resolved

docs/controllers/leaderelection.md Outdated Show resolved Hide resolved

docs/controllers/leaderelection.md Outdated Show resolved Hide resolved

clux and others added 3 commits March 19, 2024 10:20

Apply suggestions from code review

62f4125

Co-authored-by: Natalie Klestrup Röijezon <[email protected]> Signed-off-by: Eirik A <[email protected]>

reduce subjectivity and add a scaling requirements/safety preamble

803e46b

Signed-off-by: clux <[email protected]>

change leaderelection to be a scaling doc

d074d85

leader-election is ultimately one optimization for scaling and it's probably more worth highlighting other ways than tho shoehorn people into this one solution that only solves one part of the problem. Signed-off-by: clux <[email protected]>

clux changed the title ~~Leader election stub linking to implementations~~ Scaling doc with links to leader election strategies Apr 20, 2024

clux mentioned this pull request Apr 20, 2024

Change Controller concurrency defaults based on available CPU kube-rs/kube#1473

Open

fix stray sentence and turn into a note

cb52ddb

Signed-off-by: clux <[email protected]>

clux requested review from mateiidavid and nightkr April 20, 2024 11:15

clux added 2 commits April 22, 2024 11:28

better linkability and remove examples of 3rd party code

3f37c1e

Signed-off-by: clux <[email protected]>

minor sentence tweaks

6556d3c

Signed-off-by: clux <[email protected]>

clux changed the title ~~Scaling doc with links to leader election strategies~~ Scaling doc with links to leader election crates Apr 22, 2024

mateiidavid reviewed Apr 22, 2024

View reviewed changes

docs/controllers/scaling.md Outdated Show resolved Hide resolved

docs/controllers/scaling.md Outdated Show resolved Hide resolved

docs/controllers/scaling.md Show resolved Hide resolved

clux and others added 3 commits April 22, 2024 19:33

Update docs/controllers/scaling.md

1ee50b5

Co-authored-by: Matei David <[email protected]> Signed-off-by: Eirik A <[email protected]>

Update docs/controllers/scaling.md

a9afe7e

Co-authored-by: Matei David <[email protected]> Signed-off-by: Eirik A <[email protected]>

expand on more implicit sharding strategies

bda37fe

Signed-off-by: clux <[email protected]>

split scaling into scaling + availability

d66699e

Signed-off-by: clux <[email protected]>

quick proofread

90bfaa7

Signed-off-by: clux <[email protected]>

mateiidavid approved these changes Apr 23, 2024

View reviewed changes

clux added 2 commits April 23, 2024 20:25

stray word + s/cpu/CPU

73c51fe

Signed-off-by: clux <[email protected]>

restructure HA section to talk about availability concerns + rollout …

7cca11d

…issue Signed-off-by: clux <[email protected]>

accidentally a word

e332dfb

Signed-off-by: clux <[email protected]>

clux changed the title ~~Scaling doc with links to leader election crates~~ Scaling and availability docs with leader election advice Apr 23, 2024

clux added 2 commits April 23, 2024 21:43

motivation restructure, apparently we don't allow doubly list indents

ae197a4

so have made the callout a little more poignant Signed-off-by: clux <[email protected]>

point on responsiveness and relations

e3e5e6a

Signed-off-by: clux <[email protected]>

clux merged commit 8628d19 into main Apr 24, 2024
1 check passed

clux deleted the leader-election-stub branch April 24, 2024 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling and availability docs with leader election advice #58

Scaling and availability docs with leader election advice #58

clux commented Mar 18, 2024 •

edited

Loading

mateiidavid left a comment

nightkr commented Apr 22, 2024 •

edited

Loading

clux commented Apr 22, 2024

nightkr commented Apr 22, 2024 •

edited

Loading

nightkr commented Apr 22, 2024

clux commented Apr 22, 2024

mateiidavid left a comment

clux commented Apr 22, 2024

clux commented Apr 22, 2024

mateiidavid left a comment

mateiidavid Apr 23, 2024

clux Apr 23, 2024 •

edited

Loading

clux commented Apr 23, 2024

clux commented Apr 24, 2024 •

edited

Loading


		## High Availability

		At a certain point, the slowdown caused by pod reschedules is going to dominate the latency metrics. Thus, having more than one replica (and having HA) is a requirement for further reducing tail latencies.

Scaling and availability docs with leader election advice #58

Scaling and availability docs with leader election advice #58

Conversation

clux commented Mar 18, 2024 • edited Loading

mateiidavid left a comment

Choose a reason for hiding this comment

nightkr commented Apr 22, 2024 • edited Loading

clux commented Apr 22, 2024

nightkr commented Apr 22, 2024 • edited Loading

nightkr commented Apr 22, 2024

clux commented Apr 22, 2024

mateiidavid left a comment

Choose a reason for hiding this comment

clux commented Apr 22, 2024

clux commented Apr 22, 2024

mateiidavid left a comment

Choose a reason for hiding this comment

mateiidavid Apr 23, 2024

Choose a reason for hiding this comment

clux Apr 23, 2024 • edited Loading

Choose a reason for hiding this comment

clux commented Apr 23, 2024

clux commented Apr 24, 2024 • edited Loading

clux commented Mar 18, 2024 •

edited

Loading

nightkr commented Apr 22, 2024 •

edited

Loading

nightkr commented Apr 22, 2024 •

edited

Loading

clux Apr 23, 2024 •

edited

Loading

clux commented Apr 24, 2024 •

edited

Loading