-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP: Consistent Reads from Cache #1404
KEP: Consistent Reads from Cache #1404
Conversation
7f3d10b
to
6104f8a
Compare
6104f8a
to
535968b
Compare
Interesting idea! I took a cursory glance a question came to my mind. How would this work for paginated list calls? I couldn't find anything in the proposal on that. |
It's not solved. That said, I think that we don't necessary need to address that from day1. So as an example, in 5k-node cluster with 30pods/node, listing "pods from node X" returns 30 items, but listing from etcd would result in listing 150k pods, deserializing them, filtering and returning those 30. Listing from cache means just filtering and returning those 30, and once kubernetes/kubernetes#85445 merges (and one more small PR on top of it), it will be proportional to the size of result. So yes - this doesn't immediately solve all problem, but it may visibly help with many of those. I can imagine the algorithm like:
@jpbetz - while the above algorithm is probably not something we want to put in the KEP (do we?), the above explanation I made actually is probably worth putting (parts of it should do into motivation, I think). |
454a0d2
to
a5df1d7
Compare
a5df1d7
to
7ef2bbd
Compare
This is very useful. I've added it to the motivation section. For pagination I've added a small section about it and also listed it in the graduation criteria section as something we'd need to address fully before going to beta. |
Co-Authored-By: Jason DeTiberus <[email protected]>
I'm a bit worried about that too. I've restructured the doc to propose two alternatives: 1) Use WithProgressNotify to enable automatic watch updates, and 2) Use WatchProgressRequest to request watch updates when needed. If we can demonstrate alternative 1 is sufficient, I don't think we need to explicitly request progress notifies at all. Let's figure that out experimentally, trying alternative 1 first like you suggested. |
@lavalamp, @deads2k Can we merge this PR as provisional? @jfbai has proposed an improvement to this KEP in kubernetes/kubernetes#88264 and I'd like to get it merged so we can more easily collaborate. All outstanding comments on this PR have been addressed. |
71057a1
to
5234cc7
Compare
I think I'm OK with provisional. @deads2k? |
Sure, I will update the idea using |
I don't see a section talking about how I get to opt out of this behavior and still get a "normal" quorum read. Given the difficulty we've had with nearly every watch cache feature, I think we'll need that ability for our own debugging if nothing else. When these problems struck in the past, we were able to do find an affected cluster and use a mix of cached and live requests to figure out what was happening. I don't want to give up that ability entirely. I don't think this is a case where, "then everyone will use it", because the normal flow should be good enough to convince people to use it. If it's not and we truly want to force them, it could become an ACL check, but hopefully the feature is good enough that everyone wants to use it. |
Add the "UNRESOLVED" markers on the sections I commented on (https://github.com/kubernetes/enhancements/pull/1545/files#diff-bb6317ae71eb96981343e559d9598d80R38) and I can go for provisional |
|
||
- Resolve the "stale read" problem (https://github.com/kubernetes/kubernetes/issues/59848) | ||
- Improve the scalability and performance of Kubernetes for Get and List requests, when the watch cache is enabled | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you aren't going to try to avoid allowing true quorum reads as we have them today, please list as an explicit non-goal. Also, per my overall comment, I think this is a shortcoming and should be addressed. it doesn't need to be a default, but it should be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs more discussion. I've added a unresolved in non-goals.
Thanks @deads2k. Feedback is applied and |
I would like to make a one more deeper pass on it before we move it to implementable and I won't have time for it in the next 2.5 weeks, but I'm definitely fine with provisional. |
9d890f9
to
4fd8777
Compare
|
||
Create etcd watches with `WithProgressNotify` enabled (available in all etcd 3.x versions). | ||
|
||
When `WithProgressNotify` is enabled on an etcd watch, etcd sends progress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the proposal is to make the duration configurable on a per watch stream basis.
|
||
When an consistent LIST request is received and the watch cache is enabled: | ||
|
||
- Get the current revision from etcd for the resource type being served. The returned revision is strongly consistent (guaranteed to be the latest revision via a quorum read). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "current revision" mean exactly? We can get the largest modified revision for the listed resources (vs current revision of the entire key space). This can reduce the block time (for less frequently changed objects, there should be just no blocking).
Alternatively, we can add a new watch request type "urgent" or something to let etcd simply deliver all pending watch events or an empty result (if there is no new changes).
### Non-Goals | ||
|
||
<<[UNRESOLVED @deads]>> | ||
- Avoid allowing true quorum reads. We should think carefully about this, see: https://github.com/kubernetes/enhancements/pull/1404#discussion_r381528406 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel the "right and traditional" solution for this is to enable v system leasing in the apiserver etcd pkg. here is the description on how it works: http://web.stanford.edu/class/cs240/readings/89-leases.pdf. we already implemented it here: https://github.com/etcd-io/etcd/pull/8341/files. we can port the code into the apiserver side if we want. but this is pretty heavy :P
/lgtm I think this is a good point to start iterating from. The unresolved sections will make sure we dont' accidentally skip agreeing on what's in and out. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, jpbetz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
### Non-Goals | ||
|
||
<<[UNRESOLVED @deads]>> | ||
- Avoid allowing true quorum reads. We should think carefully about this, see: https://github.com/kubernetes/enhancements/pull/1404#discussion_r381528406 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we stop permitting true quorum reads? I didn't get that at all out of this doc when I read it?
This idea has been around for awhile and there is interest in working on it for 1.18, so I've put it in KEP form.
cc @wojtek-t @jingyih @liggitt