diff --git a/keps/prod-readiness/sig-api-machinery/3352.yaml b/keps/prod-readiness/sig-api-machinery/3352.yaml new file mode 100644 index 00000000000..caffcbade46 --- /dev/null +++ b/keps/prod-readiness/sig-api-machinery/3352.yaml @@ -0,0 +1,3 @@ +kep-number: 3352 +alpha: + approver: "@deads2k" diff --git a/keps/sig-api-machinery/3352-aggregated-discovery/README.md b/keps/sig-api-machinery/3352-aggregated-discovery/README.md new file mode 100644 index 00000000000..9e874c06585 --- /dev/null +++ b/keps/sig-api-machinery/3352-aggregated-discovery/README.md @@ -0,0 +1,874 @@ + +# KEP-3352: Aggregated Discovery + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [API](#api) + - [Aggregation](#aggregation) + - [Client](#client) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Beta](#beta) + - [GA](#ga) + - [Deprecation](#deprecation) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone +/ release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP + dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as + `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG + Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance + Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake + free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA + Endpoints](https://github.com/kubernetes/community/pull/1806) + must be hit by [Conformance + Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in + [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, + links to mailing list discussions/SIG meetings, relevant + PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + + + + +The operations that a Kubernetes API server supports are reported +through a collection of small documents partitioned by group-version. +All clients of Kubernetes APIs must send a request to every +group-version in order to "discover" the available APIs. This causes a +storm of requests for clusters and is a source of latency and +throttling. This KEP proposes centralizing the "discovery" mechanism +into one document so that clients only need to make one request to the +API server to retrieve all the operations available. + +## Motivation + + + +All clients and users of Kubernetes APIs usually first need to +“discover” what the available APIs are and how they can be used. These +APIs are described through a mechanism called “Discovery” which is +typically queried to then build the requests to correct APIs. +Unfortunately, the “Discovery” API is made of lots of small objects +that need to be queried individually, causing possibly a lot of delay +due to the latency of each individual request (up to 80 requests, with +most objects being less than 1,024 bytes). The more numerous the APIs +provided by the Kubernetes cluster, the more requests need to be +performed. + +The most well known Kubernetes client that uses the discovery +mechanism is `kubectl`, and more specifically the +`CachedDiscoveryClient` in `client-go`. To mitigate some of this +latency, kubectl has implemented a 6 hour timer during which the +discovery API is not refreshed. The drawback of this approach is that +the freshness of the cache is doubtful and the entire discovery API +needs to be refreshed after 6 hours, even if it hasn’t expired. + +This not only impacts kubectl, but all clients of kubernetes. We can +do better. + +### Goals + +- Fix the discovery storm issue currently present in kubectl +- Aggregate the discovery documents for all Kubernetes types + + + +### Non-Goals + + + +Since the current discovery separated by group-version is already GA, +removal of the endpoint will not be attempted. There are still use +cases for publishing the discovery document per group-version and this +KEP will solely focus on introducing the new aggregated endpoint. + +## Proposal + +We are proposing a new endpoint `/discovery/v1` as an aggregated +endpoint for all discovery documents. Discovery documents can +currently be found under `apis//` and `/api/v1` for +the legacy group version. This discovery endpoint will support +publishing an ETag so clients who already have the latest version of +the aggregated discovery can avoid redownloading the document. + +We will add a new controller responsible for aggregating the discovery +documents when a resource on the cluster changes. There will be no +conflicts when aggregating since each discovery document is +self-contained. + +### Notes/Constraints/Caveats (Optional) + + + +### Risks and Mitigations + + + +## Design Details + + + +We will expose a endpoint `/discovery` that will support JSON, and +protobuf and gzip compression. The endpoint will serve the aggregated +discovery document for all types that a Kubernetes cluster supports. + +### API + +The contents of this endpoint will be an `APIGroupList`, which is the +same type that is returned in the discovery document at `/apis`. The +`APIGroupList` contains a list of `APIGroup` types which includes a +list of +[`GroupVersionForDiscovery`](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1071) +types. We will modify this `GroupVersionForDiscovery` type to include +a list of +[`APIResource`](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1080). +This list of `APIResource` is what is currently published at the +`/apis//version` endpoint and is what will be aggregated into +the new endpoint that we publish. + +The endpoint will also publish an ETag calculated based on a hash of +the data for clients. + +This is approximately what the new API will look like (conflicting names will be renamed) + +```go +// APIGroupList is a list of APIGroup, to allow clients to discover the API at +// /apis. +type APIGroupList struct { + TypeMeta `json:",inline"` + // groups is a list of APIGroup. + Groups []APIGroup `json:"groups" protobuf:"bytes,1,rep,name=groups"` +} + +// APIGroup contains the name, the supported versions, and the preferred version +// of a group. +type APIGroup struct { + TypeMeta `json:",inline"` + // name is the name of the group. + Name string `json:"name" protobuf:"bytes,1,opt,name=name"` + // versions are the versions supported in this group. + // This will be sorted in descending order based on the preferred version + Versions []GroupVersionForDiscovery `json:"versions" protobuf:"bytes,2,rep,name=versions"` + // PreferredVersion will be removed for the new Discovery API + // The GroupVersionForDiscovery will be sorted based on the preferred version + // PreferredVersion GroupVersionForDiscovery `json:"preferredVersion,omitempty" protobuf:"bytes,3,opt,name=preferredVersion"` + + // ServerAddresssByClientCIDRs will be removed for the new Discovery API + //ServerAddressByClientCIDRs []ServerAddressByClientCIDR `json:"serverAddressByClientCIDRs,omitempty" protobuf:"bytes,4,rep,name=serverAddressByClientCIDRs"` +} + +// GroupVersion contains the "group/version" and "version" string of a version. +// It is made a struct to keep extensibility. +type GroupVersionForDiscovery struct { + // groupVersion specifies the API group and version in the form "group/version" + // This will be removed for the new discovery + // GroupVersion string `json:"groupVersion" protobuf:"bytes,1,opt,name=groupVersion"` + // version specifies the version in the form of "version". This is to save + // the clients the trouble of splitting the GroupVersion. + Version string `json:"version" protobuf:"bytes,2,opt,name=version"` + // resources contains the name of the resources and if they are namespaced. + APIResources []APIResource `json:"resources" protobuf:"bytes,2,rep,name=resources"` + + // LastContacted is the last time that the apiserver has successfully reached the + // corresponding group version's discovery document. This will be nil if the group-version + // has not been aggregated yet (APIResources will be empty). To maintain consistency across scenarios with multiple + // apiservers, this time will be quantized down to the nearest fifteen minutes. + LastContacted *time.Time `json:"lastContacted" protobuf:"bytes,opt,name=lastContacted"` +} + +// APIResource specifies the name of a resource and whether it is namespaced. +type APIResource struct { + // name is the plural name of the resource. + Name string `json:"name" protobuf:"bytes,1,opt,name=name"` + // singularName is the singular name of the resource. This allows clients to handle plural and singular opaquely. + // The singularName is more correct for reporting status on a single item and both singular and plural are allowed + // from the kubectl CLI interface. + SingularName string `json:"singularName" protobuf:"bytes,6,opt,name=singularName"` + // namespaced indicates if a resource is namespaced or not. + Namespaced bool `json:"namespaced" protobuf:"varint,2,opt,name=namespaced"` + // group is the preferred group of the resource. Empty implies the group of the containing resource list. + // For subresources, this may have a different value, for example: Scale". + Group string `json:"group,omitempty" protobuf:"bytes,8,opt,name=group"` + // version is the preferred version of the resource. Empty implies the version of the containing resource list + // For subresources, this may have a different value, for example: v1 (while inside a v1beta1 version of the core resource's group)". + Version string `json:"version,omitempty" protobuf:"bytes,9,opt,name=version"` + // kind is the kind for the resource (e.g. 'Foo' is the kind for a resource 'foo') + Kind string `json:"kind" protobuf:"bytes,3,opt,name=kind"` + // verbs is a list of supported kube verbs (this includes get, list, watch, create, + // update, patch, delete, deletecollection, and proxy) + Verbs Verbs `json:"verbs" protobuf:"bytes,4,opt,name=verbs"` + // shortNames is a list of suggested short names of the resource. + ShortNames []string `json:"shortNames,omitempty" protobuf:"bytes,5,rep,name=shortNames"` + // categories is a list of the grouped resources this resource belongs to (e.g. 'all') + Categories []string `json:"categories,omitempty" protobuf:"bytes,7,rep,name=categories"` + + // StorageVersionHash will be removed in the new Discovery API + // StorageVersionHash string `json:"storageVersionHash,omitempty" protobuf:"bytes,10,opt,name=storageVersionHash"` +} +``` + +### Aggregation + +For the aggregation layer on the server, a new controller will be +created to aggregate discovery for built-in types, apiextensions types +(CRDs), and types from aggregated api servers. + +A post start hook will be added and the kube-apiserver health check +should only pass if the discovery document is ready. Since aggregated +api servers may take longer to respond and we do not want to delay +cluster startup, the health check will only block on the local api +servers (built-ins and CRDs) to have their discovery ready. For api +servers that have not been aggregated, their group-versions will be +published with an empty resource list and a null value for +`lastContacted` to indicate that they have not synced yet. + +### Client + +The `client-go` interface will be modified to add a new method to +retrieve the aggregated discovery document and `kubectl` will be the +initial candidate. As a starting point, `kubectl api-resources` should +use the aggregated discovery document instead of sending a storm of +requests. + +### Test Plan + + + +[x] I/we understand the owners of the involved components may require +updates to existing tests to make this code solid enough prior to +committing the changes necessary to implement this enhancement. + +##### Prerequisite testing updates + + + +##### Unit tests + + + + + +This will be implemented in a new package in kube-aggregator. + +##### Integration tests + + + +For alpha, integration tests will be added to exercise the new +aggregated discovery code path. + +##### e2e tests + + + +For alpha, tests will be added to exercise the new aggregated +discovery code path for kubectl, both on the server and client side. + +### Graduation Criteria + + + +#### Alpha + +- Feature implemented behind a feature flag +- Initial e2e tests completed and enabled +- At least one client (kubectl) has an implementation to use the + aggregated discovery feature + +#### Beta + +- kubectl uses the aggregated discovery feature by default + +#### GA + +- TBD + +**Note:** Generally we also wait at least two releases between beta +and GA/stable, because there's no opportunity for user feedback, or +even bug reports, in back-to-back releases. + +**For non-optional features moving to GA, the graduation criteria must +include [conformance tests].** + +[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md + +#### Deprecation + + +### Upgrade / Downgrade Strategy + +Aggregated discovery will be behind a feature gate. It is an in-memory +feature and upgrade/downgrade is not a problem. + +### Version Skew Strategy + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + +- [x] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: AggregatedDiscovery + - Components depending on the feature gate: kube-apiserver + +###### Does enabling the feature change any default behavior? + +No + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + Yes, the feature may be disabled by reverting the feature flag. + +###### What happens if we reenable the feature if it was previously rolled back? + +The feature does not depend on state, and can be disabled/enabled at +will. + +###### Are there any tests for feature enablement/disablement? + + + +A test will be added to ensure that the RESTMapper functionality works +properly both when the feature is enabled and disabled. + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +High latency or failure of a metric in the newly added discovery +aggregation controller. If the `/discovery/v1` endpoint is not +reachable or if there are errors from the endpoint, that could be a +sign of rollback as well. + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +`/discovery/v1` endpoint is populated with discovery information, and all expected group-versions are present. + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [x] Metrics + - Metric name: `aggregator_discovery_aggregation_duration` + - Components exposing the metric: `kube-server` + - This is a metric for exposing the time it took to aggregate all the + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +Yes. A metric for the regeneration count of the discovery document. `aggregator_discovery_aggregation_count` + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +###### Will enabling / using this feature result in introducing new API types? + + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + +With aggregation, the size of the aggregated discovery document could +be an issue in the future since clients will need to download the +entire document on any resource update. At the moment, even with 3000 +CRDs (already very unlikely), the total size is still smaller than +1MB. + +## Alternatives + + + +An alternative that was considered is [Discovery Cache +Busting](https://docs.google.com/document/d/1AulBtUYjVcc4s809YSQq4bdRdDO3byY7ew9za4Ortj4). +Cache-busting allows clients to know if the files need to be +downloaded at all, and in most cases the download can be skipped +entirely. This typically works by including a hash of the resource +content in its name, while marking the objects as never-expiring using +cache control headers. Clients can then recognize if the names have +changed or stayed the same, and re-use files that have kept the same +name without downloading them again. + +Aggregated Discovery was selected because of the amount of requests that are saved both on startup and on changes involving multiple group versions. For a full comparison between Discovery Cache Busting and Aggregated Discovery, please refer to the [Google Doc](https://docs.google.com/document/d/1sdf8nz5iTi86ErQy9OVxvQh_0RWfeU3Vyu0nlA10LNM). diff --git a/keps/sig-api-machinery/3352-aggregated-discovery/kep.yaml b/keps/sig-api-machinery/3352-aggregated-discovery/kep.yaml new file mode 100644 index 00000000000..32f20b651b2 --- /dev/null +++ b/keps/sig-api-machinery/3352-aggregated-discovery/kep.yaml @@ -0,0 +1,48 @@ +title: Aggregated Discovery +kep-number: 3352 +authors: + - "@alexzielenski" + - "@jefftree" +owning-sig: sig-api-machinery +participating-sigs: + - sig-cli +status: implementable +creation-date: 2022-06-07 +reviewers: + - "@apelisse" + - "@seans3" +approvers: + - "@deads2k" + - "@lavalamp" + +##### WARNING !!! ###### +# prr-approvers has been moved to its own location +# You should create your own in keps/prod-readiness +# Please make a copy of keps/prod-readiness/template/nnnn.yaml +# to keps/prod-readiness/sig-xxxxx/00000.yaml (replace with kep number) +prr-approvers: + - "@deads2k" + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.25" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.25" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: AggregatedDiscovery + components: + - kube-apiserver +disable-supported: true + +# The following PRR answers are required at beta release +# metrics: +# - my_feature_metric