Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new sections for CRD storageversion updates #4551

Closed
wants to merge 5 commits into from

Conversation

richabanker
Copy link
Contributor

  • One-line PR description: Add an 'Agreements' section for CRD storageversion updates

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 15, 2024
@k8s-ci-robot k8s-ci-robot requested a review from fedebongio March 15, 2024 00:02
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Mar 15, 2024
@k8s-ci-robot k8s-ci-robot requested a review from jpbetz March 15, 2024 00:02
@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 15, 2024
@richabanker richabanker changed the title Add an Agreements section for CRD storageversion updates Add new sections for CRD storageversion updates Mar 15, 2024
@richabanker
Copy link
Contributor Author

cc @roycaihw

@richabanker
Copy link
Contributor Author

cc @alexzielenski

@richabanker richabanker force-pushed the sv-api-2 branch 3 times, most recently from 7043b7a to 1b1ade7 Compare March 15, 2024 00:40
@richabanker
Copy link
Contributor Author

cc @jpbetz

@@ -347,6 +349,30 @@ correct order.
[enables]:https://github.com/kubernetes/kubernetes/blob/220498b83af8b5cbf8c1c1a012b64c956d3ebf9b/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go#L703
[filter]:#updating-storageversion

#### Agreements

1. Storageversion updates will be triggered by CRD creates and updates
Copy link
Member

@roycaihw roycaihw Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC you mean SV updates are triggered at CRD create/update watch events.

[Edited on 3/22]: To clarify, I assumed that the proposal here was also "Storageversion updates will not be triggered by CR write requests"-- which is something the old PR does. Please let me know if I misunderstood.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we clean up SV if a CRD is deleted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this we want to reconcile SV with CRDs.

We should be careful how we word this. I'd like to make sure we always reconcile SVs. If we take a purely edge triggered approach then if the apiserver (in a single control plane node configuration) crashes, it could miss the event. So we should perform reconciliation fully (i.e. take a level triggered approach). It's quite possible that during a relist we'll find that a SV is not reconciled with CRDs, we should update it at that time..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC you mean SV updates are triggered at CRD create/update watch events.

That's right. Updated to reflect the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we clean up SV if a CRD is deleted?

Yeah been thinking about that. We need to introduce a delete capability. There was probably no need for a delete SV behavior for built-in resources so I guess we dont have a deleteSV() in staging/src/k8s.io/apiserver/pkg/storageversion as of now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to introduce a delete capability

Agreed. Let's make sure we have a cleanup strategy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we clean up SV if a CRD is deleted?

Why do we need to clean up SVs when a CRD is deleted?

If a CRD is deleted, it should trigger garbage collection, which means individual CRs (of that CRD) get deleted, which means we have no corresponding entry in etcd for individual objects (of that CRD).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. For all SV objects we make their owner ref to be the respective CRDs. So once the CRDs go away, so should the SV objects. @roycaihw WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Edited on 3/22]: To clarify, I assumed that the proposal here was also "Storageversion updates will not be triggered by CR write requests"-- which is something the old PR does. Please let me know if I misunderstood.

Yes, this is correct. I have now clearly laid out the 2 scenarios where we will be triggering SV udpates: 1. when a CRD create/update watch event is received and 2. when the apiserver is started.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to make sure we always reconcile SVs. If we take a purely edge triggered approach then if the apiserver (in a single control plane node configuration) crashes, it could miss the event. So we should perform reconciliation fully (i.e. take a level triggered approach)

@jpbetz fully agreed, thanks for raising this. I have updated the KEP to reflect that we will perform SV updates in 2 scenarios:

  1. edge triggered scenario: when a CRD is created/updated
  2. level triggered scenario: when the apiserver starts

#### Limitations

When a storageversion of a CRD is updated, we will ensure that all new CR writes:
1. wait for the latest storageversion to be published
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a CRD is being updated to store v2 instead of v1, and two requests come in at the same time:

  1. Create a new CR
  2. Update the CRD to store v3 instead of v2

Which one is the latest storageversion here? v2 or v3?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "at the same time": without a happens-before constraint, I don't think we should define the behavior.

Our guarantee should be that once a response to the CRD Update is sent, a happens-before is established, and all future CR creates store v3 (because clients can observe this ordering). After the CRD update is received and before the response is sent, storage at either version is possible.

Copy link
Contributor Author

@richabanker richabanker Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on which request will be processed by the apiserver first. Following scenarios are possible:

  1. CR request is received first: it will wait for the v2 update to finish.

    1a. If no new CRD update is received yet, storageversion is successfully updated to v2, and the CR is written in the latest storageversion at this point which is v2. Now the CRD update to v3 is received, and the storageversion is updated to v3.

    1b. If a new CRD update to v3 is received at this time, our async SV update loop will switch to processing the latest storageversion - v3, and will never process the v2 update. CR writes waiting for v2 update to finish will time out.

  2. CRD request is received first: our async SV update loop switches to processing the v3 update. IF now a CR request is received, it will wait for v3 update to finish and will be served in the latest storageversion at this point, v3.

@@ -347,6 +349,30 @@ correct order.
[enables]:https://github.com/kubernetes/kubernetes/blob/220498b83af8b5cbf8c1c1a012b64c956d3ebf9b/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go#L703
[filter]:#updating-storageversion

#### Agreements

1. Storageversion updates will be triggered by CRD creates and updates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this we want to reconcile SV with CRDs.

We should be careful how we word this. I'd like to make sure we always reconcile SVs. If we take a purely edge triggered approach then if the apiserver (in a single control plane node configuration) crashes, it could miss the event. So we should perform reconciliation fully (i.e. take a level triggered approach). It's quite possible that during a relist we'll find that a SV is not reconciled with CRDs, we should update it at that time..

#### Limitations

When a storageversion of a CRD is updated, we will ensure that all new CR writes:
1. wait for the latest storageversion to be published
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "at the same time": without a happens-before constraint, I don't think we should define the behavior.

Our guarantee should be that once a response to the CRD Update is sent, a happens-before is established, and all future CR creates store v3 (because clients can observe this ordering). After the CRD update is received and before the response is sent, storage at either version is possible.


This implies that we will take a write outage for the duration of the storageversion update. We will allow this for the following reasons:
1. blocking CR writes till the SV update is finished is in-line with how we handle requests for built-in resources
2. if we update the CRD handler to the new handler **before** a storageversion update completes, we risk a server crash - and objects maybe written in new version but storageversion API would not reflect that
Copy link
Contributor

@jpbetz jpbetz Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible to always avoid this situation. I do think it's possible to always update the CRD handler before SV updates, but I don't think we can guarantee atomicity of the CRD handler update and SV update (consider network partitions or process crashes happening at inconvenient times). So while it seems nice to try to get the storageversion upgraded in the happy path, I think we need to also to tolerate the skew.

Copy link
Contributor

@jpbetz jpbetz Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF we wan't to hold all writes indefinitely until SV is updated, we should spell out the requirements carefully. What happens on apiserver startup when a CRD has been updated to a new storage version but the SV hasn't been updated yet. What happens when updating the SV and the request times out?

In both cases, my understanding is that we loose some form of availability of the CRD until the SV update completes. we should call that out in the limitations and say exactly what is unavailable (CR writes only?).

Copy link
Contributor Author

@richabanker richabanker Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens on apiserver startup when a CRD has been updated to a new storage version but the SV hasn't been updated yet. What happens when updating the SV and the request times out?

As discussed on chat, we will ensure that SVs are always reconciled when the server is started/restarted before we serve any CR writes.
I have updated this part to explicitly state the 2 scenarios where we will take an outage on CR write requests.

About the SV update timing out, I have included a line saying we will reattempt the update a certain number of times before failing the SV udpate and consequently the dependent CR writes. Does this look ok?

1. wait for the CRD handler to be replaced

This implies that we will take a write outage for the duration of the storageversion update. We will allow this for the following reasons:
1. blocking CR writes till the SV update is finished is in-line with how we handle requests for built-in resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly different than built-ins right? Built-in blocking can only occur once on apiserver startup while CR writes can be blocked every time a CRD is updated? Nothing wrong with maintaining the same approach, just something to consider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, for CRDs - we will always block new CR writes until the latest SV update is finished. Allowing CR writes while the SV update is pending can cause problems like the old handler not being able to understand the new CRD version that may be persisted in etcd by another server.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 25, 2024
@richabanker
Copy link
Contributor Author

cc @deads2k @liggitt for their thoughts

@jpbetz
Copy link
Contributor

jpbetz commented Mar 28, 2024

/lgtm

Looks like all my concerns are addressed. I think this reflects my understanding of how it should work.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2024
1. old storage of the same CRD is deleted
2. pending CR writes using old storageversion of the same CRD are completed
2. We will do this to prevent publishing a newer storageversion while a pending CR write is still in progress. Otherwise, if the pending CR write finishes writing the CR using an old version, the storageversion API would not reflect that and the object will remain in etcd, encoded in an old version forever
4. We will block new CR writes for a CRD until we have published its latest storageversion. This is discussed more in the [limitations] section
Copy link
Contributor

@alexzielenski alexzielenski Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the meaning of "published" in this context. Does this imply that any future reads of the StorageVersion resource are guaranteed to reflect the update? (asking as I am not sure myself about what guarantee the apiserver itself provides when it responds to the UPDATE request)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that any future reads of the StorageVersion resource are guaranteed to reflect the update?

yes, this was the intention for this statement. That once a CRD update happens, and its storageversion has been updated(or published to the SV API) only then will we unblock the CR writes.

@@ -347,6 +349,39 @@ correct order.
[enables]:https://github.com/kubernetes/kubernetes/blob/220498b83af8b5cbf8c1c1a012b64c956d3ebf9b/staging/src/k8s.io/apiextensions-apiserver/pkg/apiserver/customresource_handler.go#L703
[filter]:#updating-storageversion

#### Agreements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's an “agreement” in this context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like a contract/principles that we propose to use while processing CR / CRD requests that may need to interact with the possible storageversion updates (for the same CRD) happening in parallel.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 9, 2024
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: richabanker
Once this PR has been reviewed and has the lgtm label, please ask for approval from jpbetz. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@deads2k
Copy link
Contributor

deads2k commented Apr 10, 2024

it's pretty detailed and I'm open to change as we go, but it seems like a reasonable place to start.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 9, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 8, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants