KEP-4192: Move Storage Version Migrator in-tree #4240

nilekhc · 2023-09-25T15:34:14Z

One-line PR description: Move Storage Version Migrator in-tree.

Issue link: Move Storage Version Migrator in-tree #4192

Other comments:

enj · 2023-09-25T15:51:10Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+When the Migrator controller is integrated in-tree, it will leverage the `Streaming List` approach to obtain and monitor resources while executing storage version migrations, as opposed to using the resource-intensive `chunked list` method. In cases where, for any reason, the client cannot establish a streaming watch connection with the API server, it will fall back to the standard `chunked list` method, retaining the older LIST/WATCH semantics.
+
+- _Open Question_:
+    - Does `Streaming List` support inconsistent lists? Currently, with chunked lists, we receive inconsistent lists. We need to determine if we can achieve the same with Streaming List. Depending on the outcome, we may consider removing the [ContinueToken](https://github.com/kubernetes-sigs/kube-storage-version-migrator/blob/60dee538334c2366994c2323c0db5db8ab4d2838/pkg/apis/migration/v1alpha1/types.go#L63) from the API.


@deads2k @jpbetz @wojtek-t @p0lyn0mial in an attempt to verify this, I hacked in continue token support to streaming watch. While that implementation is obviously not what we would ship, would y'all be okay with the alpha SVM code simply using streaming watch and assuming it will work (i.e. for large resource sets it will fail to migrate if the watch cache or etcd are unable to provide a consistent list over the slow migration loop that SVM performs). Assuming that streaming watch gives lexicographical ordering of events (it appears to, though I do not know if that is on purpose), it can support inconsistent continue tokens, at least if the caller has a way to bypass the watch cache. @wojtek-t @p0lyn0mial is that something y'all would be willing to support as part of the streaming watch KEP?

@p0lyn0mial

Unfortunately, streaming doesn't support pagination (passing a continuation token) because the watchcache doesn't have built-in support for this mechanism.

I know that @wojtek-t considered adding support (as part of a new KEP) to the watchcache and even discussed it at a meeting, but I don’t know the current status.

As for using streaming in general, we plan to enable this feature for some component in this iteration. So, we really don't know if this mechanism works. We are about to find out.
I personally don't have anything against using the new mechanism for the alpha version of SVM.

Assuming that streaming watch gives lexicographical ordering of events

mhm, streaming as give me a consistent list (snapshot) of data as a stream gives you a random order. The main use is that you collect the data received from the server in some local store and only then operate on it (you don’t process individual events).

mhm, watching in general gives you a deterministic order (but not lexicographical) - the events are delivered in the order in which they occurred.

Assuming you cannot use an informer (I haven't checked) please check out https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/tools/cache/reflector.go#L330 to see how you could implement a fallback logic.

At some point I wanted to also change the client-go's LIST method to follow streaming but I haven't looked into it yet.

FWIW - I opened kubernetes/kubernetes#120897 to fix the ordering issue for watch-cache, so once that merges, the ordering problem is solved. But I'm still far from being convinced about supporting continue token...

What I don't like is supporting continueToken for resuming watchlist, because the semantic of this would be strange

Yeah I'm not crazy about doing something like this. It makes resumption subtly semantically different and I don't really like the idea of giving people foot guns.

This is a tough problem, but I'd be disappointed if it prevented us from making forward progress in 1.29.

How about we stick with paginated list for alpha and add a beta graduation criteria line item like "Resolve API server memory utilization problems with paginated list by enabling watch-list to handle large lists (either via a "inconsistent pagination token" mechanism or via some other approach)" ?

I'm ok with defering to beta, but I see having a predictable and limited memory footprint as critical at scale. The list OOMing problem, even with low hundreds of items, is a serious problem.

I definitely agree that resolving it should block Beta.
But for Alpha, I'm with Joe - I suggest proceeding with paginated lists to unblock progress.

Can we mark it explicitly as

sttts · 2023-09-25T16:25:56Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+
+### APIs to move
+#### We will move following [APIs](https://github.com/kubernetes-sigs/kube-storage-version-migrator/blob/60dee538334c2366994c2323c0db5db8ab4d2838/pkg/apis/migration/v1alpha1/types.go) in-tree:
+- `v1alpha1` of `storageversionmigrations.migration.k8s.io`  


would have expected some apiserver related group.

wojtek-t · 2023-09-26T09:27:08Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+
+- Automatic storage version migration for CRDs
+- Make it easy for Kuberentes developers to drop old API schemas by guaranteeing that storage migration is run automatically on SV hash changes (should this also be on a timer or API server identity?)
+- Automated Storage Version Migration via the hash exposed by the `StorageVersion` API


I would say that we generally want to get to this point where the migration happens automatically.
But making it a follow-up is probably reasonable for the sake of making progress.

Right we all agree that we need this to happen automatically, but we have tried to keep this KEP small so that we don't stall like the previous efforts.

wojtek-t · 2023-09-26T09:29:28Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+    }
+
+    // Describes the state of a migration at a certain point.
+    type MigrationCondition struct {


Can we instead (for the sake of unification) use metav1.Condition instead:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1497

@wojtek-t Thanks for the review. I have updated it.

wojtek-t · 2023-09-26T10:14:03Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+When the Migrator controller is integrated in-tree, it will leverage the `Streaming List` approach to obtain and monitor resources while executing storage version migrations, as opposed to using the resource-intensive `chunked list` method. In cases where, for any reason, the client cannot establish a streaming watch connection with the API server, it will fall back to the standard `chunked list` method, retaining the older LIST/WATCH semantics.
+
+- _Open Question_:
+    - Does `Streaming List` support inconsistent lists? Currently, with chunked lists, we receive inconsistent lists. We need to determine if we can achieve the same with Streaming List. Depending on the outcome, we may consider removing the [ContinueToken](https://github.com/kubernetes-sigs/kube-storage-version-migrator/blob/60dee538334c2366994c2323c0db5db8ab4d2838/pkg/apis/migration/v1alpha1/types.go#L63) from the API.


There are couple different things mentioned here that we're touching

Assuming that streaming watch gives lexicographical ordering of events

This is only true for etcd implementation (because the list from etcd is returned in order).
This is not true for watchcache implementation as of now:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/cacher/watch_cache_interval.go#L120-L150

The fix would be trivial though - it just requires adding sorting there [and we should probably do that].

it can support inconsistent continue tokens

My problem with that is that I'm not sure what that would effectively mean from the API perspective.
The way how watch-list currently works is that:

it lists all items [providing guarantees about freshness based on input parameters]

[if requests] returns a bookmark showing that "list" operation finished

continues to watch for items from that point [as part of the same request]

If we decide to support continuation token, (3) no longer makes sense to me [we started list from the middle of the collection, but watch is actually watching for the whole collection].
[I guess we would have to stop after point (2) then, but that still seems a bit sloppy...]

We also don't return continuation token when list-watch fails in the middle of listing phase [though that we can probably do].

at least if the caller has a way to bypass the watch cache.

If the caller bypasses watchcache, then the gain from watch-list is zero, because etcd implementation is doing a regular list from etcd. [We basically had need to ensure that API is supported if watch-cache is disabled, so we have both watchcache and etcd implementation, but the assumption is that watchcache implementation is the one that will generally be used and this is the one that actually provides real benefits].

That said, watchcache doesn't support pagination...

would y'all be okay with the alpha SVM code simply using streaming watch and assuming it will work

That sounds reasonable to me for Alpha...
... but we need some solution for Beta.
And here my problem is that based on what I wrote above I'm not sure that supporting continuation for watch-list is the right path forward. And if not we would need a fallback to regular paginated list [and if so maybe we should just use paginated lists?]

keps/sig-api-machinery/4192-svm-in-tree/README.md

logicalhan · 2023-09-30T03:34:00Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+When the Migrator controller is integrated in-tree, it will leverage the `Streaming List` approach to obtain and monitor resources while executing storage version migrations, as opposed to using the resource-intensive `chunked list` method. In cases where, for any reason, the client cannot establish a streaming watch connection with the API server, it will fall back to the standard `chunked list` method, retaining the older LIST/WATCH semantics.
+
+- _Open Question_:
+    - Does `Streaming List` support inconsistent lists? Currently, with chunked lists, we receive inconsistent lists. We need to determine if we can achieve the same with Streaming List. Depending on the outcome, we may consider removing the [ContinueToken](https://github.com/kubernetes-sigs/kube-storage-version-migrator/blob/60dee538334c2366994c2323c0db5db8ab4d2838/pkg/apis/migration/v1alpha1/types.go#L63) from the API.


What I don't like is supporting continueToken for resuming watchlist, because the semantic of this would be strange

Yeah I'm not crazy about doing something like this. It makes resumption subtly semantically different and I don't really like the idea of giving people foot guns.

jpbetz · 2023-10-03T17:30:45Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+        // The resource that is being migrated. The migrator sends requests to
+        // the endpoint serving the resource.
+        // Immutable.
+        Resource GroupVersionResource `json:"resource"`


When reading this it sounded like this is the GVR being migrated "from". Is that correct? Should we clarify further?

I updated the comment and the field name as well. PTAL. I'm open to other suggestions as well. 😄

jpbetz · 2023-10-03T17:46:28Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+        // The storage version hash to avoid races.
+        StorageVersionHash string `json:"storageVersionHash,omitempty"`


Sorry if this is obvious, but what's this for?

I added it to resolve - https://github.com/kubernetes-sigs/kube-storage-version-migrator/blob/60dee538334c2366994c2323c0db5db8ab4d2838/pkg/apis/migration/v1alpha1/types.go#L64-L65

If I understand this correctly, we can use the hash to validate whether the resource still needs migration.

However, upon further examination, I realize that in the out-of-tree SVM migration, the Migration Resource is created automatically, making it easier to populate this field. With the current implementation, we are asking the user to create the Migration Resource. This may not provide the best user experience, as users would need to know the appropriate hash. WDYT?

cc @enj

jpbetz · 2023-10-03T17:48:06Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+
+### [Controller](https://github.com/kubernetes-sigs/kube-storage-version-migrator/tree/60dee538334c2366994c2323c0db5db8ab4d2838/pkg/controller) to move
+#### [Migrator Controller](https://github.com/kubernetes-sigs/kube-storage-version-migrator/tree/60dee538334c2366994c2323c0db5db8ab4d2838/pkg/migrator)
+Currently, the Storage Version Migrator comprises two controllers: the `Trigger` controller and the `Migrator` controller. The Trigger controller performs resource discovery, identifying supported resources with the preferred server version every `10 minutes`. Subsequently, the Trigger controller creates the `StorageVersionMigration` resource to initiate the migration process. The Migrator controller then picks up this resource and executes the actual migration.


Is this design level or edge triggered? If edge triggered, how do we ensure that we don't miss (fail to observe) storage version changes?

When transitioning the Storage Version Migrator in-tree, we will exclusively move the Migrator controller as a component of KCM. The creation of the Migration resource will be deferred to the user, instead of being triggered automatically.

With this approach, we are requiring users to be more proactive and knowledgeable about when they need storage migration. The automatic trigger for Storage Migration is deferred to future enhancements. In this KEP, we are scoping to have a controller and feature in-tree so that we can further enhance it.

We have listed it as one of the unclear goals in the KEP.

### UNCLEAR Goals and/or Non-Goals - Automatic storage version migration for CRDs - Make it easy for Kubernetes developers to drop old API schemas by guaranteeing that storage migration is run automatically on SV hash changes (should this also be on a timer or API server identity?) - Automated Storage Version Migration via the hash exposed by the `StorageVersion` API

Okay, I think this answers my question. Can we make it super clear in the KEP that there will be no automatic triggering, I somehow missed that when first reading it.

In Proposal section, I updated this:

**[UNCLEAR]** Create a new controller that watches the storage version API and automatically triggers migrations based on changes (to the SV API)

to:

For Alpha, we will not trigger automatic storage version migration, and it will be deferred to the user.

deads2k · 2023-10-03T23:43:54Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+### Rollout, Upgrade and Rollback Planning
+
+###### How can a rollout or rollback fail? Can it impact already running workloads?
+This may impact workloads, as issuing too many Storage Version Migration requests can increase the latency on the API server.


Also explain the OOM risk caused when paginating lists and each item is large due to memory multiplication during encode.

I updated the answer to add this context. PTAL. Thanks.

keps/sig-api-machinery/4192-svm-in-tree/README.md

deads2k · 2023-10-04T18:37:56Z

PRR looks good, but I think we agree that "on by default" (beta), requires resolving the "this may cause kube-apiserver OOMs and cascading failure even for pagination values due to memory explosion". The 6G allocation caused by 400 one meg secrets was subpar and not worse case.

Signed-off-by: Nilekh Chaudhari <[email protected]>

jpbetz · 2023-10-04T19:50:21Z

/approve

Leaving lgtm with @deads2k

deads2k · 2023-10-04T21:13:39Z

/lgtm
/approve

k8s-ci-robot · 2023-10-04T21:13:50Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, jpbetz, nilekhc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [deads2k]
~~keps/sig-api-machinery/OWNERS~~ [deads2k,jpbetz]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wojtek-t · 2023-10-05T11:47:49Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+- Feature is enabled by default
+- All of the above documented tests are complete
+
+### Upgrade / Downgrade Strategy


Can you please fill in upgrade/downgrade statrgy and version skew stragegy?

Thanks @wojtek-t. I opened #4276 to address this.

wojtek-t · 2023-10-05T11:48:59Z

keps/sig-api-machinery/4192-svm-in-tree/README.md

+
+###### Will enabling / using this feature result in any new API calls?
+
+Yes. Creation of the Migration Request which creates `StorageVersionMigration` resource. 


That's actually the least important stuff.

The most important stuff are the LIST requests that it's sending for resource types to be migrated and an Update request for each of the resources of that type to actually migrate it...

Added it in the #4276.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 25, 2023

k8s-ci-robot requested a review from apelisse September 25, 2023 15:34

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Sep 25, 2023

k8s-ci-robot requested a review from deads2k September 25, 2023 15:34

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 25, 2023

enj mentioned this pull request Sep 25, 2023

Move Storage Version Migrator in-tree #4192

Open

12 tasks

enj reviewed Sep 25, 2023

View reviewed changes

enj requested review from jpbetz and removed request for apelisse September 25, 2023 15:55

nilekhc force-pushed the enj/f/svm_in_tree_kep branch 2 times, most recently from 80c39e2 to a6c3c50 Compare September 25, 2023 16:13

sttts reviewed Sep 25, 2023

View reviewed changes

nilekhc force-pushed the enj/f/svm_in_tree_kep branch from a6c3c50 to 4e65451 Compare September 25, 2023 16:34

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 25, 2023

wojtek-t reviewed Sep 26, 2023

View reviewed changes

nilekhc force-pushed the enj/f/svm_in_tree_kep branch from 4e65451 to 549dc63 Compare September 26, 2023 17:07

nilekhc mentioned this pull request Sep 26, 2023

Initial implementation of moving Storage Version Migrator in-tree kubernetes/kubernetes#120900

Closed

logicalhan reviewed Sep 30, 2023

View reviewed changes

nilekhc force-pushed the enj/f/svm_in_tree_kep branch from 549dc63 to 91f4559 Compare October 2, 2023 16:11

nilekhc requested review from p0lyn0mial, wojtek-t and logicalhan October 2, 2023 16:11

jpbetz reviewed Oct 3, 2023

View reviewed changes

deads2k reviewed Oct 3, 2023

View reviewed changes

nilekhc force-pushed the enj/f/svm_in_tree_kep branch from 91f4559 to db1e6f6 Compare October 4, 2023 15:58

nilekhc requested a review from jpbetz October 4, 2023 16:00

nilekhc requested review from enj and deads2k October 4, 2023 16:00

nilekhc force-pushed the enj/f/svm_in_tree_kep branch from db1e6f6 to 0279d7f Compare October 4, 2023 17:34

deads2k reviewed Oct 4, 2023

View reviewed changes

keps/sig-api-machinery/4192-svm-in-tree/README.md Show resolved Hide resolved

nilekhc force-pushed the enj/f/svm_in_tree_kep branch from 0279d7f to a3a7774 Compare October 4, 2023 18:42

Move Storage Version Migrator in-tree

471341a

Signed-off-by: Nilekh Chaudhari <[email protected]>

nilekhc force-pushed the enj/f/svm_in_tree_kep branch from a3a7774 to 471341a Compare October 4, 2023 19:13

k8s-ci-robot assigned deads2k Oct 4, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 4, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2023

k8s-ci-robot merged commit c7e989f into kubernetes:master Oct 4, 2023

k8s-ci-robot added this to the v1.29 milestone Oct 4, 2023

nilekhc deleted the enj/f/svm_in_tree_kep branch October 4, 2023 21:27

wojtek-t reviewed Oct 5, 2023

View reviewed changes

nilekhc mentioned this pull request Oct 5, 2023

[UPDATE] KEP-4192: Move Storage Version Migrator in-tree #4276

Merged

jpbetz mentioned this pull request Oct 12, 2023

[WIP] [Storage Version Migration] feat: implements Storage Version Migration API in-tree kubernetes/kubernetes#121123

Merged

jpbetz mentioned this pull request May 8, 2024

Update annual-report-2023.md kubernetes/community#7878

Closed

		// The storage version hash to avoid races.
		StorageVersionHash string `json:"storageVersionHash,omitempty"`


		###### Will enabling / using this feature result in any new API calls?

		Yes. Creation of the Migration Request which creates `StorageVersionMigration` resource.

KEP-4192: Move Storage Version Migrator in-tree #4240

KEP-4192: Move Storage Version Migrator in-tree #4240

Conversation

nilekhc commented Sep 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Oct 4, 2023

jpbetz commented Oct 4, 2023

deads2k commented Oct 4, 2023

k8s-ci-robot commented Oct 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment