-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new sections for CRD storageversion updates #4551
Conversation
richabanker
commented
Mar 15, 2024
- One-line PR description: Add an 'Agreements' section for CRD storageversion updates
- Issue link: StorageVersion API for HA API servers #2339
cc @roycaihw |
7043b7a
to
1b1ade7
Compare
cc @jpbetz |
@@ -347,6 +349,30 @@ correct order. | |||
[enables]:https://github.com/kubernetes/kubernetes/blob/220498b83af8b5cbf8c1c1a012b64c956d3ebf9b/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go#L703 | |||
[filter]:#updating-storageversion | |||
|
|||
#### Agreements | |||
|
|||
1. Storageversion updates will be triggered by CRD creates and updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC you mean SV updates are triggered at CRD create/update watch events.
[Edited on 3/22]: To clarify, I assumed that the proposal here was also "Storageversion updates will not be triggered by CR write requests"-- which is something the old PR does. Please let me know if I misunderstood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we clean up SV if a CRD is deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this we want to reconcile SV with CRDs.
We should be careful how we word this. I'd like to make sure we always reconcile SVs. If we take a purely edge triggered approach then if the apiserver (in a single control plane node configuration) crashes, it could miss the event. So we should perform reconciliation fully (i.e. take a level triggered approach). It's quite possible that during a relist we'll find that a SV is not reconciled with CRDs, we should update it at that time..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC you mean SV updates are triggered at CRD create/update watch events.
That's right. Updated to reflect the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we clean up SV if a CRD is deleted?
Yeah been thinking about that. We need to introduce a delete capability. There was probably no need for a delete SV behavior for built-in resources so I guess we dont have a deleteSV() in staging/src/k8s.io/apiserver/pkg/storageversion as of now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to introduce a delete capability
Agreed. Let's make sure we have a cleanup strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we clean up SV if a CRD is deleted?
Why do we need to clean up SVs when a CRD is deleted?
If a CRD is deleted, it should trigger garbage collection, which means individual CRs (of that CRD) get deleted, which means we have no corresponding entry in etcd for individual objects (of that CRD).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. For all SV objects we make their owner ref to be the respective CRDs. So once the CRDs go away, so should the SV objects. @roycaihw WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Edited on 3/22]: To clarify, I assumed that the proposal here was also "Storageversion updates will not be triggered by CR write requests"-- which is something the old PR does. Please let me know if I misunderstood.
Yes, this is correct. I have now clearly laid out the 2 scenarios where we will be triggering SV udpates: 1. when a CRD create/update watch event is received and 2. when the apiserver is started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to make sure we always reconcile SVs. If we take a purely edge triggered approach then if the apiserver (in a single control plane node configuration) crashes, it could miss the event. So we should perform reconciliation fully (i.e. take a level triggered approach)
@jpbetz fully agreed, thanks for raising this. I have updated the KEP to reflect that we will perform SV updates in 2 scenarios:
- edge triggered scenario: when a CRD is created/updated
- level triggered scenario: when the apiserver starts
#### Limitations | ||
|
||
When a storageversion of a CRD is updated, we will ensure that all new CR writes: | ||
1. wait for the latest storageversion to be published |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a CRD is being updated to store v2 instead of v1, and two requests come in at the same time:
- Create a new CR
- Update the CRD to store v3 instead of v2
Which one is the latest storageversion here? v2 or v3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "at the same time": without a happens-before constraint, I don't think we should define the behavior.
Our guarantee should be that once a response to the CRD Update is sent, a happens-before is established, and all future CR creates store v3 (because clients can observe this ordering). After the CRD update is received and before the response is sent, storage at either version is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on which request will be processed by the apiserver first. Following scenarios are possible:
-
CR request is received first: it will wait for the v2 update to finish.
1a. If no new CRD update is received yet, storageversion is successfully updated to v2, and the CR is written in the latest storageversion at this point which is v2. Now the CRD update to v3 is received, and the storageversion is updated to v3.
1b. If a new CRD update to v3 is received at this time, our async SV update loop will switch to processing the latest storageversion - v3, and will never process the v2 update. CR writes waiting for v2 update to finish will time out.
-
CRD request is received first: our async SV update loop switches to processing the v3 update. IF now a CR request is received, it will wait for v3 update to finish and will be served in the latest storageversion at this point, v3.
@@ -347,6 +349,30 @@ correct order. | |||
[enables]:https://github.com/kubernetes/kubernetes/blob/220498b83af8b5cbf8c1c1a012b64c956d3ebf9b/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go#L703 | |||
[filter]:#updating-storageversion | |||
|
|||
#### Agreements | |||
|
|||
1. Storageversion updates will be triggered by CRD creates and updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this we want to reconcile SV with CRDs.
We should be careful how we word this. I'd like to make sure we always reconcile SVs. If we take a purely edge triggered approach then if the apiserver (in a single control plane node configuration) crashes, it could miss the event. So we should perform reconciliation fully (i.e. take a level triggered approach). It's quite possible that during a relist we'll find that a SV is not reconciled with CRDs, we should update it at that time..
keps/sig-api-machinery/2339-storageversion-api-for-ha-api-servers/README.md
Outdated
Show resolved
Hide resolved
keps/sig-api-machinery/2339-storageversion-api-for-ha-api-servers/README.md
Show resolved
Hide resolved
#### Limitations | ||
|
||
When a storageversion of a CRD is updated, we will ensure that all new CR writes: | ||
1. wait for the latest storageversion to be published |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "at the same time": without a happens-before constraint, I don't think we should define the behavior.
Our guarantee should be that once a response to the CRD Update is sent, a happens-before is established, and all future CR creates store v3 (because clients can observe this ordering). After the CRD update is received and before the response is sent, storage at either version is possible.
|
||
This implies that we will take a write outage for the duration of the storageversion update. We will allow this for the following reasons: | ||
1. blocking CR writes till the SV update is finished is in-line with how we handle requests for built-in resources | ||
2. if we update the CRD handler to the new handler **before** a storageversion update completes, we risk a server crash - and objects maybe written in new version but storageversion API would not reflect that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's possible to always avoid this situation. I do think it's possible to always update the CRD handler before SV updates, but I don't think we can guarantee atomicity of the CRD handler update and SV update (consider network partitions or process crashes happening at inconvenient times). So while it seems nice to try to get the storageversion upgraded in the happy path, I think we need to also to tolerate the skew.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IF we wan't to hold all writes indefinitely until SV is updated, we should spell out the requirements carefully. What happens on apiserver startup when a CRD has been updated to a new storage version but the SV hasn't been updated yet. What happens when updating the SV and the request times out?
In both cases, my understanding is that we loose some form of availability of the CRD until the SV update completes. we should call that out in the limitations and say exactly what is unavailable (CR writes only?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens on apiserver startup when a CRD has been updated to a new storage version but the SV hasn't been updated yet. What happens when updating the SV and the request times out?
As discussed on chat, we will ensure that SVs are always reconciled when the server is started/restarted before we serve any CR writes.
I have updated this part to explicitly state the 2 scenarios where we will take an outage on CR write requests.
About the SV update timing out, I have included a line saying we will reattempt the update a certain number of times before failing the SV udpate and consequently the dependent CR writes. Does this look ok?
1. wait for the CRD handler to be replaced | ||
|
||
This implies that we will take a write outage for the duration of the storageversion update. We will allow this for the following reasons: | ||
1. blocking CR writes till the SV update is finished is in-line with how we handle requests for built-in resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly different than built-ins right? Built-in blocking can only occur once on apiserver startup while CR writes can be blocked every time a CRD is updated? Nothing wrong with maintaining the same approach, just something to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, for CRDs - we will always block new CR writes until the latest SV update is finished. Allowing CR writes while the SV update is pending can cause problems like the old handler not being able to understand the new CRD version that may be persisted in etcd by another server.
…ers/README.md Co-authored-by: Joe Betz <[email protected]>
/lgtm Looks like all my concerns are addressed. I think this reflects my understanding of how it should work. |
1. old storage of the same CRD is deleted | ||
2. pending CR writes using old storageversion of the same CRD are completed | ||
2. We will do this to prevent publishing a newer storageversion while a pending CR write is still in progress. Otherwise, if the pending CR write finishes writing the CR using an old version, the storageversion API would not reflect that and the object will remain in etcd, encoded in an old version forever | ||
4. We will block new CR writes for a CRD until we have published its latest storageversion. This is discussed more in the [limitations] section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about the meaning of "published" in this context. Does this imply that any future reads of the StorageVersion resource are guaranteed to reflect the update? (asking as I am not sure myself about what guarantee the apiserver itself provides when it responds to the UPDATE request)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply that any future reads of the StorageVersion resource are guaranteed to reflect the update?
yes, this was the intention for this statement. That once a CRD update happens, and its storageversion has been updated(or published to the SV API) only then will we unblock the CR writes.
@@ -347,6 +349,39 @@ correct order. | |||
[enables]:https://github.com/kubernetes/kubernetes/blob/220498b83af8b5cbf8c1c1a012b64c956d3ebf9b/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go#L703 | |||
[filter]:#updating-storageversion | |||
|
|||
#### Agreements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's an “agreement” in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like a contract/principles that we propose to use while processing CR / CRD requests that may need to interact with the possible storageversion updates (for the same CRD) happening in parallel.
New changes are detected. LGTM label has been removed. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: richabanker The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
it's pretty detailed and I'm open to change as we go, but it seems like a reasonable place to start. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |