KEP: Consistent Reads from Cache #1404

jpbetz · 2019-12-11T01:27:32Z

This idea has been around for awhile and there is interest in working on it for 1.18, so I've put it in KEP form.

cc @wojtek-t @jingyih @liggitt

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md

mm4tt · 2019-12-17T11:43:57Z

Interesting idea!

I took a cursory glance a question came to my mind. How would this work for paginated list calls? I couldn't find anything in the proposal on that.

wojtek-t · 2019-12-17T11:53:55Z

I took a cursory glance a question came to my mind. How would this work for paginated list calls? I couldn't find anything in the proposal on that.

It's not solved. That said, I think that we don't necessary need to address that from day1.
In my opinion, the biggest gain from it we can get from node-originating requests (e.g. kubelet listing pods scheduled on its node). For those requests, the size of the response is small (it fits a single page, assuming you won't make it extremely small), whereas the number of objects to process is proportional to cluster-size (so fairly big).

So as an example, in 5k-node cluster with 30pods/node, listing "pods from node X" returns 30 items, but listing from etcd would result in listing 150k pods, deserializing them, filtering and returning those 30. Listing from cache means just filtering and returning those 30, and once kubernetes/kubernetes#85445 merges (and one more small PR on top of it), it will be proportional to the size of result.

So yes - this doesn't immediately solve all problem, but it may visibly help with many of those.

I can imagine the algorithm like:

if limit is set, check if we set selectors - if not fallback to etcd, if so, assume high probability the result will not be too large
ensure cache is up-to-date
try to list from cache
if result is too large, fallback to list from etcd

@jpbetz - while the above algorithm is probably not something we want to put in the KEP (do we?), the above explanation I made actually is probably worth putting (parts of it should do into motivation, I think).

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md

jpbetz · 2020-01-07T08:05:37Z

re #1404 (comment)

Joe Betz - while the above algorithm is probably not something we want to put in the KEP (do we?), the above explanation I made actually is probably worth putting (parts of it should do into motivation, I think).

This is very useful. I've added it to the motivation section.

For pagination I've added a small section about it and also listed it in the graduation criteria section as something we'd need to address fully before going to beta.

Co-Authored-By: Jason DeTiberus <[email protected]>

jpbetz · 2020-01-07T08:14:47Z

To clarify - I'm not worried in the case of fallback - then additional latency sounds reasonable to me (because that means apiserver has some problems with watch cache).
What I am worried, is requesting this progress notify even explicitly.

I'm a bit worried about that too.

I've restructured the doc to propose two alternatives: 1) Use WithProgressNotify to enable automatic watch updates, and 2) Use WatchProgressRequest to request watch updates when needed. If we can demonstrate alternative 1 is sufficient, I don't think we need to explicitly request progress notifies at all. Let's figure that out experimentally, trying alternative 1 first like you suggested.

jpbetz · 2020-02-18T22:20:34Z

@lavalamp, @deads2k Can we merge this PR as provisional? @jfbai has proposed an improvement to this KEP in kubernetes/kubernetes#88264 and I'd like to get it merged so we can more easily collaborate. All outstanding comments on this PR have been addressed.

lavalamp · 2020-02-19T00:04:08Z

I think I'm OK with provisional. @deads2k?

jfbai · 2020-02-19T03:26:47Z

@lavalamp, @deads2k Can we merge this PR as provisional? @jfbai has proposed an improvement to this KEP in kubernetes/kubernetes#88264 and I'd like to get it merged so we can more easily collaborate. All outstanding comments on this PR have been addressed.

Sure, I will update the idea using WithLastRev when after this KEP is merged.

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md

deads2k · 2020-02-19T20:29:30Z

I don't see a section talking about how I get to opt out of this behavior and still get a "normal" quorum read. Given the difficulty we've had with nearly every watch cache feature, I think we'll need that ability for our own debugging if nothing else. When these problems struck in the past, we were able to do find an affected cluster and use a mix of cached and live requests to figure out what was happening. I don't want to give up that ability entirely.

I don't think this is a case where, "then everyone will use it", because the normal flow should be good enough to convince people to use it. If it's not and we truly want to force them, it could become an ACL check, but hopefully the feature is good enough that everyone wants to use it.

deads2k · 2020-02-19T20:31:36Z

Add the "UNRESOLVED" markers on the sections I commented on (https://github.com/kubernetes/enhancements/pull/1545/files#diff-bb6317ae71eb96981343e559d9598d80R38) and I can go for provisional

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md

deads2k · 2020-02-19T20:34:15Z

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md

+
+- Resolve the "stale read" problem (https://github.com/kubernetes/kubernetes/issues/59848)
+- Improve the scalability and performance of Kubernetes for Get and List requests, when the watch cache is enabled
+


if you aren't going to try to avoid allowing true quorum reads as we have them today, please list as an explicit non-goal. Also, per my overall comment, I think this is a shortcoming and should be addressed. it doesn't need to be a default, but it should be possible.

I think this needs more discussion. I've added a unresolved in non-goals.

jpbetz · 2020-02-19T22:09:27Z

Thanks @deads2k. Feedback is applied and <<[UNRESOLVED]>>...<<[/UNRESOLVED]>> sections have been added.

wojtek-t · 2020-02-19T22:11:20Z

I would like to make a one more deeper pass on it before we move it to implementable and I won't have time for it in the next 2.5 weeks, but I'm definitely fine with provisional.

jpbetz · 2020-02-19T22:14:59Z

I would like to make a one more deeper pass on it before we move it to implementable and I won't have time for it in the next 2.5 weeks, but I'm definitely fine with provisional.

Sounds good, I've listed @wojtek-t, @lavalamp and @deads2k as the approvers for this to make sure we get your feedback.

jpbetz · 2020-02-20T04:23:34Z

@deads2k, @lavalamp Would one of you tag this?

xiang90 · 2020-02-20T07:08:56Z

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md

+
+Create etcd watches with `WithProgressNotify` enabled (available in all etcd 3.x versions).
+
+When `WithProgressNotify` is enabled on an etcd watch, etcd sends progress


the proposal is to make the duration configurable on a per watch stream basis.

xiang90 · 2020-02-20T07:22:18Z

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md

+
+When an consistent LIST request is received and the watch cache is enabled:
+
+- Get the current revision from etcd for the resource type being served. The returned revision is strongly consistent (guaranteed to be the latest revision via a quorum read).


What does "current revision" mean exactly? We can get the largest modified revision for the listed resources (vs current revision of the entire key space). This can reduce the block time (for less frequently changed objects, there should be just no blocking).

Alternatively, we can add a new watch request type "urgent" or something to let etcd simply deliver all pending watch events or an empty result (if there is no new changes).

xiang90 · 2020-02-20T07:41:17Z

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md

+### Non-Goals
+
+<<[UNRESOLVED @deads]>>
+- Avoid allowing true quorum reads. We should think carefully about this, see: https://github.com/kubernetes/enhancements/pull/1404#discussion_r381528406


i feel the "right and traditional" solution for this is to enable v system leasing in the apiserver etcd pkg. here is the description on how it works: http://web.stanford.edu/class/cs240/readings/89-leases.pdf. we already implemented it here: https://github.com/etcd-io/etcd/pull/8341/files. we can port the code into the apiserver side if we want. but this is pretty heavy :P

deads2k · 2020-02-20T15:32:19Z

/lgtm

I think this is a good point to start iterating from. The unresolved sections will make sure we dont' accidentally skip agreeing on what's in and out.

k8s-ci-robot · 2020-02-20T15:32:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, jpbetz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-api-machinery/OWNERS~~ [deads2k]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lavalamp · 2020-02-20T20:52:19Z

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md

+### Non-Goals
+
+<<[UNRESOLVED @deads]>>
+- Avoid allowing true quorum reads. We should think carefully about this, see: https://github.com/kubernetes/enhancements/pull/1404#discussion_r381528406


Why would we stop permitting true quorum reads? I didn't get that at all out of this doc when I read it?

k8s-ci-robot requested review from deads2k and lavalamp December 11, 2019 01:28

jpbetz force-pushed the consistent-reads-from-cache-kep branch 2 times, most recently from 7f3d10b to 6104f8a Compare December 11, 2019 01:34

Provisional KEP: Consistent Reads from Cache

535968b

jpbetz force-pushed the consistent-reads-from-cache-kep branch from 6104f8a to 535968b Compare December 11, 2019 01:46

enj reviewed Dec 16, 2019

View reviewed changes

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md Outdated Show resolved Hide resolved

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md Outdated Show resolved Hide resolved

jingyih reviewed Dec 17, 2019

View reviewed changes

wojtek-t reviewed Dec 17, 2019

View reviewed changes

wojtek-t mentioned this pull request Dec 19, 2019

Avoid thundering herd of relists on etcd kubernetes/kubernetes#86430

Merged

jpbetz mentioned this pull request Dec 19, 2019

Kubernetes is vulnerable to stale reads, violating critical pod safety guarantees kubernetes/kubernetes#59848

Open

apply feedback

fb9adac

jpbetz force-pushed the consistent-reads-from-cache-kep branch 2 times, most recently from 454a0d2 to a5df1d7 Compare December 31, 2019 00:46

Add pagination details, update toc

7ef2bbd

jpbetz force-pushed the consistent-reads-from-cache-kep branch from a5df1d7 to 7ef2bbd Compare December 31, 2019 00:52

detiber reviewed Dec 31, 2019

View reviewed changes

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md Outdated Show resolved Hide resolved

wojtek-t reviewed Jan 2, 2020

View reviewed changes

hormes reviewed Jan 7, 2020

View reviewed changes

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md Outdated Show resolved Hide resolved

answer1991 reviewed Jan 7, 2020

View reviewed changes

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md Outdated Show resolved Hide resolved

jpbetz and others added 2 commits January 7, 2020 00:10

apply feedback

67b149c

Update keps/sig-api-machinery/20191210-consistent-reads-from-cache.md

9f0b777

Co-Authored-By: Jason DeTiberus <[email protected]>

Apply feedback

5234cc7

jpbetz force-pushed the consistent-reads-from-cache-kep branch from 71057a1 to 5234cc7 Compare February 18, 2020 22:25

deads2k reviewed Feb 19, 2020

View reviewed changes

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md Show resolved Hide resolved

deads2k reviewed Feb 19, 2020

View reviewed changes

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md Show resolved Hide resolved

deads2k reviewed Feb 19, 2020

View reviewed changes

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md Show resolved Hide resolved

deads2k reviewed Feb 19, 2020

View reviewed changes

Apply feedback

486957e

Add approvers

4fd8777

jpbetz force-pushed the consistent-reads-from-cache-kep branch from 9d890f9 to 4fd8777 Compare February 19, 2020 22:15

xiang90 reviewed Feb 20, 2020

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 20, 2020

k8s-ci-robot merged commit 2a27144 into kubernetes:master Feb 20, 2020

k8s-ci-robot added this to the v1.18 milestone Feb 20, 2020

lavalamp reviewed Feb 20, 2020

View reviewed changes

jfbai mentioned this pull request Mar 6, 2020

fix potential data race kubernetes/kubernetes#88891

Closed

serathius mentioned this pull request May 31, 2023

Revive 2340 Consistent reads from cache KEP #4047

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP: Consistent Reads from Cache #1404

KEP: Consistent Reads from Cache #1404

jpbetz commented Dec 11, 2019 •

edited

Loading

mm4tt commented Dec 17, 2019

wojtek-t commented Dec 17, 2019

jpbetz commented Jan 7, 2020

jpbetz commented Jan 7, 2020 •

edited

Loading

jpbetz commented Feb 18, 2020

lavalamp commented Feb 19, 2020

jfbai commented Feb 19, 2020

deads2k commented Feb 19, 2020

deads2k commented Feb 19, 2020

deads2k Feb 19, 2020

jpbetz Feb 19, 2020

jpbetz commented Feb 19, 2020

wojtek-t commented Feb 19, 2020

jpbetz commented Feb 19, 2020

jpbetz commented Feb 20, 2020

xiang90 Feb 20, 2020 •

edited

Loading

xiang90 Feb 20, 2020

xiang90 Feb 20, 2020 •

edited

Loading

deads2k commented Feb 20, 2020

k8s-ci-robot commented Feb 20, 2020

lavalamp Feb 20, 2020


		- Resolve the "stale read" problem (https://github.com/kubernetes/kubernetes/issues/59848)
		- Improve the scalability and performance of Kubernetes for Get and List requests, when the watch cache is enabled


		Create etcd watches with `WithProgressNotify` enabled (available in all etcd 3.x versions).

		When `WithProgressNotify` is enabled on an etcd watch, etcd sends progress


		When an consistent LIST request is received and the watch cache is enabled:

		- Get the current revision from etcd for the resource type being served. The returned revision is strongly consistent (guaranteed to be the latest revision via a quorum read).

KEP: Consistent Reads from Cache #1404

KEP: Consistent Reads from Cache #1404

Conversation

jpbetz commented Dec 11, 2019 • edited Loading

mm4tt commented Dec 17, 2019

wojtek-t commented Dec 17, 2019

jpbetz commented Jan 7, 2020

jpbetz commented Jan 7, 2020 • edited Loading

jpbetz commented Feb 18, 2020

lavalamp commented Feb 19, 2020

jfbai commented Feb 19, 2020

deads2k commented Feb 19, 2020

deads2k commented Feb 19, 2020

deads2k Feb 19, 2020

Choose a reason for hiding this comment

jpbetz Feb 19, 2020

Choose a reason for hiding this comment

jpbetz commented Feb 19, 2020

wojtek-t commented Feb 19, 2020

jpbetz commented Feb 19, 2020

jpbetz commented Feb 20, 2020

xiang90 Feb 20, 2020 • edited Loading

Choose a reason for hiding this comment

xiang90 Feb 20, 2020

Choose a reason for hiding this comment

xiang90 Feb 20, 2020 • edited Loading

Choose a reason for hiding this comment

deads2k commented Feb 20, 2020

k8s-ci-robot commented Feb 20, 2020

lavalamp Feb 20, 2020

Choose a reason for hiding this comment

jpbetz commented Dec 11, 2019 •

edited

Loading

jpbetz commented Jan 7, 2020 •

edited

Loading

xiang90 Feb 20, 2020 •

edited

Loading

xiang90 Feb 20, 2020 •

edited

Loading