-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-0661: StatefulSet volume resize #3412
Conversation
areller
commented
Jun 19, 2022
•
edited
Loading
edited
- One-line PR description: This enhancement proposes to add the ability to resize volumes associated with StatefulSet via modifications to its specification.
- Issue link: Support Volume Expansion Through StatefulSets #661
- Other comments: This is a fork of KEP-0661: StatefulSet volume resize #1848 and KEP-0661: StatefulSet volume resize #2842 the former has been long abandoned, and with the latter, after talking with the author, we've decided to continue the work in this PR.
Welcome @areller! |
Hi @areller. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@gnufied
Ideally we should probably strive to validate as much as possible during admission of the statefulset update. To give an example why it's a good idea, a lot of people that use kubernetes, might have some kind of a CI/CD system that deploys their changes. If an error happens, but it only happens on a controller (and reported as an event on the statefulset object), and not during admission, the CI/CD might not report those errors back to the user, making them think that every change had been applied successfully, not realizing that the resize had actually failed. To allow statefulset validation to access other objects (i.e. a storageclass), I think we'll need to create an admission plugin, or use the one that already exists for validating that the storage class of a PVC supports expansion when resizing it https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/admission/storage/persistentvolume/resize/admission.go I'm not sure if it's going to cover all failure scenarios. volume expansion could fail for reasons other than storage class doesn't support expansion. Maybe the user tries to resize to a size that isn't supported by underlying driver for example.
According to https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1790-recover-resize-failure we allow users to decrease the request size of a PVC under certain circumstances, so I don't think we should prevent users to decrease request size in the PVC template. but maybe we can allow that only after checking the
I personally don't think we should support that in the scope of this proposal. i.e. we should either reject resize during admission if offline expansion is required (assuming it's possible to check if a storage class requires offline expansion during admission. not sure about that), or fail in the statefulset controller. If we end up supporting offline expansions, either in this KEP or in a follow up, we should do everything to avoid restarting pods, unless online expansion is supported.
As mentions in previous PRs, I think it's a good idea to not include PVC template updates in revision control, for reasons a. If user issues an update to the pod template of the statefulset and a resize to the PVC template at the same time, we want to allow them to rollback the pod template section. including PVC template updates in revision control will make it impossible since the rollback would try to rollback the PVC resize, which we don't want to do.
I don't think that the statefulset controller should wait for the resize to be finished on the PVC itself Though, if we decide to implement a solution for offline expansion as pat of this proposal, we'd probably need to wait for resize to finish, to know when to restart the pods https://github.com/kubernetes/enhancements/pull/660/files#r254095445
As mentioned before, we should make best effort to catch as many errors during admission. Let me know what you think and what other questions you had |
/assign @gnufied @xing-yang |
/ok-to-test |
I’m going to help review this, is there any additional context not captured in this kep or the previous PR I need to be aware of? |
Hi @smarterclayton I think that the README describes it pretty well |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
what happens when the resize of one PVC fails but the others succeed? maybe due to the last one putting it over quota/capacity on the backend? Unlike KEP 1790, you couldn't revert the PVC size in the stateful set spec because some already have allocatedResources of the new size, so the failing PVC is back to the original issue of resize failing over and over. |
I was thinking about producing an error event on the statefulset object and let the user handle it. not sure if there's a way to know ahead of time whether all resize operations going to succeed or some will fail |
Is there any update on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about two major issues in this proposal:
- The perma beta on resize functionality.
- The proposed changes to statefulset controller, which will make the controller more complicated, and do not provide sufficient guarantees about reliability of the operations. I'd like to see a more strict coordination mechanism which ensures proper user-feedback from the resize operation.
- [ ] (R) Production readiness review approved | ||
- [ ] "Implementation History" section is up-to-date for milestone | ||
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] | ||
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure appropriate boxes are checked.
[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md | ||
--> | ||
|
||
Kubernetes has supported volume expansion as a beta feature since `v1.16`. However, this feature support is only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we link to appropriate enhancement here, I believe you're talking about https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/556-csi-volume-resizing, are you?
Second, more important question, why it's beta for more than 10 releases? How this falls into https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/1635-prevent-permabeta which I guess is a question for sig-storage more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Volume expansion is GA now. I believe this part of the KEP is outdated - https://kubernetes.io/blog/2022/05/05/volume-expansion-ga/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a link there, so future me (and other readers) can easily find it?
- StatefulSet controller creates the PVCs from `volumeClaimTemplates` and associates them with the pods | ||
spawned from it. Reconciliation logic runs for every change made to the StatefulSet | ||
`Spec` and `Status` reflects the progress towards making the real world state equal to the | ||
`Spec`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: .spec
and .status
are lower-cased, generally.
`Spec`. | ||
- The validation logic in the API server is responsible for restricting the fields | ||
which can be edited in a StatefulSet specification. | ||
- Part of the validation that the API server performs for objects (eg `StatefulSet`, `PVC`) happens in the object's API server code and part happens in an admission plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall any validation happening in admission, admission plugins are optional, all the validation we do is either inside the apiserver. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the PVC validation that happens in admission plugins? I'll admit, I'm not familiar with that part of the code base 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of it happens here https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/admission/storage/persistentvolume/resize/admission.go
Part of the validation that happens in that admission is making sure that the storage class supports expansions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're saying this is "layout the various components involved in creating and managing the persistent volumes associated with StatefulSet
" can you make sure this is properly described such that all readers of this proposal can follow what are the actors and what is the relationship between them.
|
||
The following changes are proposed to achieve the goal: | ||
1. Make the `volumeClaimTemplates` storage size (`.spec.resources.request.storage`) editable by modification to the api server | ||
validation. We should allow the user to both increase or decrease the storage size in case of expansion failures. Allowing the user to decrease the storage size will give them the ability to address failed expansions, as described in [KEP 1790](../1790-recover-resize-failure/README.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In non-goals section you've explicitly mentioned Shrinking of volume is not allowed.
so I'm confused with increase or decrease
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to add more clarity here. The current plan I have in mind is to allow you to decrease the size of the PVC to the highest (across all replicas) capacity and not lower. Allowing the user to decrease the size up to the highest capacity will allow to recover from failed resizes
* It's an over simplification of the volume resize process or of how StatefulSet reconciliation works - but it should suffice in the context of this discussion. | ||
* The numbers I denote on the arrows do not indicate the order in which operations happen. They are there for easier identification. | ||
|
||
### Possible Errors/Failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to match the current KEP template, see https://github.com/kubernetes/enhancements/blob/master/keps/NNNN-kep-template/README.md Here you should start the Risks and Mitigations
section.
|
||
* The storage class of the volume we're trying to resize doesn't allow for expansions. | ||
* It shouldn't be a problem to have this validation in the StatefulSet validation/admission and we'll talk about it during [Validation](#validation). | ||
* We are trying to decrease the size of the volume. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm again confused, so we allow both increase and decrease or only the former?
|
||
### Validation | ||
|
||
We'd ideally want as much validation to happen as early as possible, ie during StatefulSet admission/validation. In reality, perfect validation during StatefulSet admission might be very hard to achieve as it involves communication with parts that the StatefulSet admission plugin won't have direct access to, like the CSI driver. Nonetheless, we'll discuss all the possible validation that we could potentially be performing during admission whether or not we'll actually end up performing them in the final implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
admission and validation are two separate processes, see https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/. I'm a bit confused when you're using both, since in the StatefulSet case the only validation that's happening is inside the kube-apiserver. Maybe having a glossary at the beginning to clearly denote when you're talking about apiserver validation, when about admission validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StatefulSet admission plugin won't have direct access to
there's no such thing, StatefulSet validation happens within kube-apiserver, only. There can be additional validation, but those are optional.
|
||
#### Validation Of The Storage Class | ||
|
||
We'd want to validate that the storage class of the volume we're trying to resize allows for expansions. This validation can be performed easily by creating a plugin admission for the StatefulSet object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need validating admission for it and not do that in the kube-apisever directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about creating a plugin similar to https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/admission/storage/persistentvolume/resize/admission.go
this may very well be a misunderstanding of mine but I notice that we usually don't do complex validation in the apis package https://github.com/kubernetes/kubernetes/tree/master/pkg/apis (by complex I mean having to access other API objects) and delegate those kinds of more complex validation to an admission plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment about introducing all the "actors" and their relationship in the introduction such that this enhancement will be better understood.
``` | ||
|
||
Some notes about the new fields | ||
* The `finishedReconciliationGeneration` field is optional and won't appear if the PVC template is newly created and isn't reconciled yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just observedGeneration
since it's part of the volumeClaimTemplates array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand. My understanding is that observedGeneration
is the last generation
of the statefulset that the statefulset has observed. This can't be used as a reliable feedback mechanism, because it wouldn't give the user feedback that the volume has finished expanding.
finishedReconciliationGeneration
on the other hand will be incremented to the latest generation only once the statefulset controller has observed that the PVC has been fully expanded to the requested size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
osbserrvedGeneration
is in every object, so I'm proposing to use that instead of adding a new one. But your understanding of the mechanism in case of StatefulSet is correct. I'm proposing to use the exact same mechanism (that users are already familiar with) also for volumes, does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're suggesting to repurpose observedGeneration? Ie modify the statefulset controller to only update it when PVC reconciliation finishes? It's possible and one more benefit is that existing clients won't have to be modified to get feedback on volume reconciliation
but two things that are worth mentioning
- With the new field, we get per volume granularity (imagine a statefulset with two volume templates)
- Some clients might be interested in getting feedback for when the pods of the statefulset get fully reconciled and wouldn't care about the volume. With the new field, there's better granularity and clients can choose what kind of feedback they need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong, but I think you can achieve the exact same results with observedGeneration
like I mentioned before.
|
- "@harshanarayana" | ||
owning-sig: sig-apps | ||
participating-sigs: | ||
- sig-storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the current location of the proposal, you should make sig-storage as the owning-sig and sig-apps as participating.
milestone: | ||
alpha: "v1.24" | ||
beta: "v1.25" | ||
stable: "v1.26" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The milestones should be updated accordingly too.
[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md | ||
--> | ||
|
||
Kubernetes has supported volume expansion as a beta feature since `v1.16`. However, this feature support is only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a link there, so future me (and other readers) can easily find it?
I've asked to link that GA blog post in other comment, so we're on the same page ;)
I'm asking for a more detailed description of the interactions, such that I can better understand the flow of information. As it currently stands I'm having hard time to properly grasp the scope and guarantees of the flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few more comments, but the proposal has still several missing bits.
`Spec`. | ||
- The validation logic in the API server is responsible for restricting the fields | ||
which can be edited in a StatefulSet specification. | ||
- Part of the validation that the API server performs for objects (eg `StatefulSet`, `PVC`) happens in the object's API server code and part happens in an admission plugin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're saying this is "layout the various components involved in creating and managing the persistent volumes associated with StatefulSet
" can you make sure this is properly described such that all readers of this proposal can follow what are the actors and what is the relationship between them.
|
||
#### Validation Of The Storage Class | ||
|
||
We'd want to validate that the storage class of the volume we're trying to resize allows for expansions. This validation can be performed easily by creating a plugin admission for the StatefulSet object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment about introducing all the "actors" and their relationship in the introduction such that this enhancement will be better understood.
``` | ||
|
||
Some notes about the new fields | ||
* The `finishedReconciliationGeneration` field is optional and won't appear if the PVC template is newly created and isn't reconciled yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
osbserrvedGeneration
is in every object, so I'm proposing to use that instead of adding a new one. But your understanding of the mechanism in case of StatefulSet is correct. I'm proposing to use the exact same mechanism (that users are already familiar with) also for volumes, does that make sense?
Kubernetes has supported volume expansion as a beta feature since `v1.16`. However, this feature support is only | ||
limited to expanding volumes by editing the PVCs as an adhoc operation. Expansion of the volumes associated with a | ||
`StatefulSet` created via the `volumeClaimTemplate` is not supported directly via the modification of the `volumeClaimTemplate` | ||
construct of the `StatefulSet`. This enhancement proposes to add the ability to resize volumes associated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
construct of the `StatefulSet`. This enhancement proposes to add the ability to resize volumes associated | |
construct of the `StatefulSet`. This enhancement proposes to add the ability to resize volumes associated |
Any news on this KEP? |
I want go ahead and finish this KEP, is there anything I can do? |
- Ensure statefulset controller revision works properly after resize. | ||
- Test resize failure recovery after [KEP 1790](../1790-recover-resize-failure/README.md) has been implemented. | ||
|
||
* We'll need to make sure that while the feature flag is disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be a way to disable it on a per-STS basis? We have a use case for some of our STS where we want different replicas to have different size volumes.
@areller Is this still being worked on? I would like to contribute or take over this KEP if it's stale |
We really need this ability and we would like to contribute to this Kep. |
We are working on #4651 to keep this going. Comments are welcomed! |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@kfox1111: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
For those wondering why this was closed, there is a new KEP-4650 which basically replaced this. |