KEP: Namespace Initializer #2543

easeway · 2018-08-15T18:21:01Z

This KEP proposes a specialized initializer mechanism for namespaces, based on the discussion on #2177

I submit this proposal because the generic Initializer is planned to be deprecated, however the mechanism is very useful for container resources like namespace, and I hope we can keep this mechanism, at least specialize it and move it towards beta/GA. If the generic Initializer will move on to beta/GA, I will abandon this proposal.

Note: this proposal requires changes in core Kubernetes, and adding a new field in stable API object.

/sig architecture
/sig api-machinery
/sig auth

/cc @bgrant0607
/cc @smarterclayton
/cc @jbeda
/cc @lavalamp
/cc @liggitt
/cc @tallclair
/cc @davidopp

k8s-ci-robot · 2018-08-15T18:21:08Z

Hi @easeway. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2018-08-15T18:21:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: jbeda

If they are not already assigned, you can assign the PR to them by writing /assign @jbeda in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

keps/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

smarterclayton · 2018-08-15T20:06:33Z

@lavalamp this originated from the initializer proposal being considered as a "not sufficiently useful mechanism for anything *other* than namespaces". We ultimately have two options: 1. have a generic initializer field for all resources that isn't tied to a global mechanism, but is instead per resource 2. have an initializer field specifically JUST for namespaces, because namespaces have a unique and novel lifecycle as the only "container" resource we have I could go either way, but 2 seems more appropriate in the near term than keeping a generic field with a generic structure in place.

…

On Wed, Aug 15, 2018 at 2:21 PM k8s-ci-robot ***@***.***> wrote: [APPROVALNOTIFIER] This PR is *NOT APPROVED* This pull-request has been approved by: To fully approve this pull request, please assign additional approvers. We suggest the following additional approver: *jbeda* If they are not already assigned, you can assign the PR to them by writing /assign @jbeda in a comment when ready. The full list of commands accepted by this bot can be found here <https://go.k8s.io/bot-commands>. The pull request process is described here <https://git.k8s.io/community/contributors/guide/owners.md#the-code-review-process> Needs approval from an approver in each of these files: - *keps/OWNERS <https://github.com/kubernetes/community/blob/master/keps/OWNERS>* Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#2543 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p04rRRpjf5B-ywiN_NjN0qYqI57sks5uRGaYgaJpZM4V-kiI> .

lavalamp · 2018-08-15T20:51:51Z

Added to my queue.

Initial question just based on your comment--how confident are we that namespace will always be the only container resource? When I started writing this sentence I thought 80% but by the end I'm thinking 60%.

smarterclayton · 2018-09-17T13:19:16Z

I'm not sure. Everyone talks a big game about a new container resource, but in 4 years nothing has gotten close.

lavalamp · 2018-10-05T17:43:28Z

keps/draft-20180806-namespace-initializer.md

+#### Core Kubernetes Changes
+
+The generic Initializers uses `metadata.Initializers` to record all pending initializations.
+As this is going to be deprecated, for namespaces, `Initializers` is added to `NamespaceSpec` as an _alpha_ field:


I would actually prefer to leave it in metadata.

I think it'd be OK/interesting to define a ContainerMetadata type which includes this and use it only for namespaces.

Even if namespace is the only container type k8s ever has, I think initializers still need to be understood by the machinery in many places and therefore spec not a good choice.

After reading the rest of the proposal, I'm on the fence about this; I'll leave it here as a discussion point. I think the doc could maybe explain why it wasn't chosen in an alternatives section?

If we decided to put that in metadata, I will be happy with that. And we can actually keep the existing Initializers and move it forward. I'd be happy to abandon this PR if we can move Initializers forward.

Even if namespace is the only container type k8s ever has, I think initializers still need to be understood by the machinery in many places and therefore spec not a good choice.

The way this is designed, I don't think initialization is a generic concept. I think it is specific to namespaces as the only scoping resource we have. I'd rather have metadata without initializers and the field in the spec of the namespace.

I'm not quite ready to give up on the idea that we might eventually gain another scoping resource. We could make a ContainerMetadata which is ObjectMetadata (inlined) + initializers; that way we wouldn't impact other resources.

@lavalamp IMO ContainerMetadata is a good idea, if we can foresee namespace is not the only container resource. I will put it in the proposal.

In V1 I would much rather have concrete intitializers on namespace aligned with a concrete finalizers field which won’t change for v1, and worry about hypothetical other initializers added as case may come up.

The current initializers mechanism lacks the ability to allow resources to have their own initialization flow, and at the limit initialization is a domain problem (for instance, a job graph) which shouldn’t be reduced to a generic field.

lavalamp · 2018-10-05T17:44:11Z

keps/draft-20180806-namespace-initializer.md

+  // Finalizers is an opaque list of values that must be empty to permanently remove object from storage.
+  // More info: https://kubernetes.io/docs/tasks/administer-cluster/namespaces/
+  // +optional
+  Finalizers []FinalizerName `json:"finalizers,omitempty" protobuf:"bytes,1,rep,name=finalizers,casttype=FinalizerName"`


Finalizers is already in metadata, why would it be in spec, too?

The v1 Namespace today has Finalizers in spec. This PR is opened in the assumption the generic Initializers is going to be removed from metadata, so specifically for Namespace, put both Initializers and Finalizers in spec to be more consistent. If we decided to keep Initializers in metadata, I agree we should use Finalizers in metadata too.

Finalizers is already in metadata, why would it be in spec, too?

namespaces predate "normal" finalizers, so this field is how namespaces do it.

They do it wrong and should change in v2.

I'd like to keep the scope to adding fields to v1 Namespace objects, not making a v2 namespace which will take longer.

We’ve said consistency within an API trumps “doing it the way we wanted to do it” for keeping users on a clear path. I think this is just another example of that.

justaugustus · 2018-10-13T03:17:11Z

/kind kep

deads2k · 2018-10-23T16:22:17Z

keps/draft-20180806-namespace-initializer.md

+  Pending []NamespaceInitializer `json:"pending" protobuf:"bytes,1,rep,name=pending" patchStrategy:"merge" patchMergeKey:"name"`
+  // If result is set with the Failure field, the object will be persisted to storage and then deleted,
+  // ensuring that other clients can observe the deletion.
+  Result *metav1.Status `json:"result,omitempty" protobuf:"bytes,2,opt,name=result"`


this looks like a status condition to me

Agreed. This can be expressed in status condition too. As the design is copied from Initializers without change, I'd prefer to keep it as is, and make the scope small. We can evolve later if status condition will be a better place.

deads2k · 2018-10-23T16:25:08Z

keps/draft-20180806-namespace-initializer.md

+
+```go
+// NamespaceInitializerConfiguration describes the configuration of initializers.
+type NamespaceInitializerConfiguration struct {


Seems like this should actually be a normal admission plugin configuration done via file.

Why would we have a file based configuration? Those are hard to manipulate at runtime. I much prefer API resources absent some sort of justification (e.g., security).

Using file based configuration is less flexible as it requires the configuration during cluster bootstrapping and changes will require restarting of control plane.

deads2k · 2018-10-23T16:25:54Z

keps/draft-20180806-namespace-initializer.md

+
+- The namespace initialization controllers register themselves by creating `NamespaceInitializerConfiguration` resources
+- When a namespace is newly created
+  - The `NamespaceInitializers` admission controller aggregates all `NamespaceInitializersConfiguration` resources and generate `spec.Initializers` in this namespace resource


Why wouldn't we tell people to use a webhook admission plugin to add themselves to the list of initializers for a namespace?

how do namespaces decide whether they are subject to being run through an admission plugin w.r.t. NamespaceSelector?

it gets a little weird to use this to hook in policy when the thing used to select whether it applies to a namespace is an attribute of the incoming object:

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/namespace/matcher.go#L89-L97

IIRC namespace objects have their own labels tested before running through the admission controller, for better or worse.

@deads2k Yes. We can definitely tell people to use a webhook admission plugin. However, adding a webhook admission plugin requires additional work, including setting up a webserver with TLS certificates, creating webhook configurations, exposing services, etc. And the stability of webhook plugins also impacts the request in API path. With NamespaceInitializersConfiguration saves these extra work that people only need to write a controller. It also helps we can focus on the stability of NamespaceInitializers admin controller.

This design is completely copied from existing Initializers: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#initializers, and IMO the original design is good as it offers a simplified configuration in additional to writing own admission webhook plugin.

@liggitt Not sure if I understand the comment correctly. The NamespaceInitializers admission controller runs through all namespaces without filtering. And currently there's no filtering/selection mechanism in NamespaceInitializerConfiguration. That means, they are applied to all newly created namespaces, and leave to the actual controllers performing initialization work to decide what to do with a particular namespace.

For the primary use case given (self-service namespace creation/population - #2177), a webhook is still required to capture the info about the creating user, right? the initializer name populated in the namespace doesn't have enough information for auto-policy-grants to know who to grant permissions to

Yes, for the particular use case in #2177. I'd love to separate this problem from this PR and discuss that in #2177. However, one of the major dependencies on this PR is that, we need this initialization mechanism to make sure the compatibility of API behavior, that an existing CLI/script won't break when namespace is being initialized. And that's also the reason we need this change in core Kubernetes. There's no other way to achieve that, e.g. a webhook admission plugin.

@deads2k Yes. We can definitely tell people to use a webhook admission plugin. However, adding a webhook admission plugin requires additional work, including setting up a webserver with TLS certificates, creating webhook configurations, exposing services, etc. And the stability of webhook plugins also impacts the request in API path. With NamespaceInitializersConfiguration saves these extra work that people only need to write a controller. It also helps we can focus on the stability of NamespaceInitializers admin controller.

This design is trying to avoid that at the cost of adding additional complexity to core. In addition, doing it this way restricts options that admission plugins would have about precisely when and how particular initializers are applied. Using a webhook, I'm sure that the openshift use-cases could be covered. Using what's described here, I'm not guaranteed that it would be.

about precisely when and how particular initializers are applied

Is that specifically talking about the order of initializers? If yes, using Webhook admission plugin (I mean MutatingAdmissionWebhook) doesn't guarantee the order. Actually they are the same: creating MutatingWebhookConfiguration vs creating NamespaceInitializerConfiguration. And it's really difficult to define/enforce the ordering unless the webhook plugins (or initializers) are clearly predefined during cluster bootstrapping time and disallow dynamic deployment of arbitrary webhooks/initializers.

Anyway, with NamespaceInitializerConfiguration, it doesn't block the use of webhook admission plugin. It's still possible/reasonable to use webhook admission plugin depending on a particular use case. However we did hear the complain about the extra work/complexity to build a webhook while the developer only want a controller to populate a namespace when it's being initialized. That demonstrates the value of NamespaceInitializerConfiguration.

Alternatively, I'm open to not implementing NamespaceInitializerConfiguration at the beginning, and only focus on initialization mechanism, and with that alone, webhook plugin is required to add initializers.

Maybe we want to have an easy to consume initializer impl as part of controller-runtime that someone can quickly iterate on.

liggitt · 2018-10-23T19:24:34Z

keps/draft-20180806-namespace-initializer.md

+  - Once the namespace resource is persisted, the following happens in parallel:
+    - Namespace initialization controllers get notified about the newly created namespace,
+      and they start creating resources inside namespace and also update `spec.Initializers` to remove themselves once the initialization is done
+    - The storage layer will wait until `spec.Initializers` becomes empty before returning the response back to client, unless `includeUninitialized` is true in the request.


Holding the creation request until async initializers observe and act on the object being created leads to a lot of complexity. In the original design, it was paired with the includeUninitialized option on list/watch (which was also extremely complex), in order to prevent controllers from acting on uninitialized objects (like running a pod or provisioning a service load balancer before it was initialized).

I'm not sure the same requirements exist for container-type objects (a namespace doesn't "do" anything... it's not a workload that could be prematurely executed, etc). I'm not eager to attempt introducing the "exists but is hidden unless you pass a special flag" behavior again if we don't have to.

I understand holding the creation request is complex and using includeUninitialized can be problematic. However, this seems to be the only way to ensure behavioral compatibility with existing client code - which assumes the readiness of a resource upon receiving the creation response. That's also a reason to introduce status condition in namespace resource, with which we can gradually improve the client code (at least client-go and kubectl) to determine the readiness of a resource based on status condition.

For example:

apiVersion: v1 kind: Namespace metadata: name: ns1 --- apiVersion: apps/v1 kind: Deployment metadata: namespace: ns1 name: app1 ...

There are a lot of usage of above manifest with a single kubectl apply -f. It will break without holding the namespace creation request if namespace is not ready because of being initialized by some controller while the creation response reaches kubectl. For example, in self-service namespace creation, the creator may only have the permission to create a namespace, and waits until NamespaceTemplate to populate admin permission inside the namespace. The single kubectl apply will break without initialization mechanism.

I think a reasonable plan will be still including the mechanism of includeUninitialized flag to make sure nothing breaks. And later deprecate this complex logic when most of the clients are patched and working with status condition (this will take significant long time, probably a few years).

liggitt · 2018-10-23T19:25:18Z

keps/draft-20180806-namespace-initializer.md

+  // An initializer is a controller which enforces some system invariant at object creation time.
+  // This field is a list of initializers that have not yet acted on this object. If nil or empty,
+  // this object has been completely initialized. Otherwise, the object is considered uninitialized
+  // and is hidden (in list/watch and get calls) from clients that haven't explicitly asked to


is hidden (in list/watch and get calls) from clients that haven't explicitly asked to observe uninitialized objects.

this proved very problematic when implementing initializers before (and we never successfully resolved the issues)

I understand it's problematic. Please see my comments above for a plan to eventually fix this.

independent of the outcome of the hold-until-initialized behavior during creation, I don't understand why we need to modify get/list/watch this way for container objects (objects that don't have behavior associated with them). can you elaborate on what use cases the get/list/watch changes enable?

It's about the concurrency issue.

In general, one user is creating a container object, before the container object is fully initialized (e.g. populating policy objects etc.), another user (or controller) is get/list/watch this kind of container objects. Without hiding, they may get notified about the creation of this container object, and they will move forward immediately assuming this new container object is ready for use.

Here's a more specific use case:

In a corp, the cluster admin team is sharing a cluster with multiple development teams. The cluster admin team setup namespace population mechanism to populate common policy objects into all namespaces. The develop team A want to enforce specific policies according to the needs of their own project, for example ResourceQuota on testing Pods. They created a namespace for running their automation pipeline which includes a controller watching namespaces created for sub-teams and creating ResourceQuota in each namespaces.

In this case, when team lead creates a namespace for a sub-team, before namespace population fully initializes the namespace with policies defined by cluster admin team, the controller enforcing ResourceQuota should not be notified about the readiness of the namespace, as RBAC rules may not be setup for service account running the controller. This demonstrates the need of "includeUninitialized" flag for "get/list/watch" besides "create".

the controller enforcing ResourceQuota should not be notified about the readiness of the namespace, as RBAC rules may not be setup for service account running the controller

I don't think that is particularly convincing, or justifies the complexity of hiding objects from list/watch. A controller that needs to react to populate a "container" object should be able to retry/backoff. The primary reasons for hiding uninitialized objects from listers/watchers were around "point of no return"-type operations like starting a pod or actuating something described in an object's spec, and wanting to prevent that from happening until the object was initialized.

I accept your point and agree that hiding is not necessary for "get/list/watch" specifically for container resources as long as status conditions are available. Because there's no behavior associated with container resources, and thus no "point of no return" operations. The clients can always observe status condition to decide the readiness of the resource.

Which all namespace consumers already have to do today (service accounts, quota)

easeway · 2018-11-19T20:03:56Z

Updated the KEP according to Oct 24, 2018 sig API machinery meeting:

Removed the mechanism holding the response;
Added a section discussing about security: a proposal using initialize verb in RBAC.

justaugustus · 2018-11-20T04:48:44Z

REMINDER: KEPs are moving to k/enhancements on November 30. Please attempt to merge this KEP before then to signal consensus.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.

justaugustus · 2018-12-01T08:03:50Z

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

k8s-ci-robot · 2018-12-01T08:03:51Z

@justaugustus: Closed this PR.

In response to this:

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested review from bgrant0607, davidopp, jbeda, lavalamp, liggitt, smarterclayton and tallclair August 15, 2018 18:21

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 15, 2018

easeway force-pushed the kep-namespace-initializer branch from 26d7797 to 3b26a6b Compare August 15, 2018 18:24

easeway mentioned this pull request Aug 15, 2018

KEP: Namespace Population #2177

Closed

lavalamp reviewed Oct 5, 2018

View reviewed changes

k8s-ci-robot added the kind/kep label Oct 13, 2018

deads2k reviewed Oct 23, 2018

View reviewed changes

easeway force-pushed the kep-namespace-initializer branch from 3b26a6b to 7c62fa5 Compare October 23, 2018 18:53

k8s-ci-robot added the sig/pm label Oct 23, 2018

liggitt reviewed Oct 23, 2018

View reviewed changes

easeway force-pushed the kep-namespace-initializer branch from 7c62fa5 to cd160fd Compare October 23, 2018 20:42

proposal namespace initializer

14d2614

easeway force-pushed the kep-namespace-initializer branch from cd160fd to 14d2614 Compare November 19, 2018 19:59

k8s-ci-robot closed this Dec 1, 2018

easeway mentioned this pull request Dec 4, 2018

KEP: Namespace Initializer kubernetes/enhancements#645

Closed

KEP: Namespace Initializer #2543

KEP: Namespace Initializer #2543

Conversation

easeway commented Aug 15, 2018

k8s-ci-robot commented Aug 15, 2018

k8s-ci-robot commented Aug 15, 2018

smarterclayton commented Aug 15, 2018 via email

lavalamp commented Aug 15, 2018

smarterclayton commented Sep 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

easeway Oct 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justaugustus commented Oct 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Oct 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Oct 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Oct 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

easeway commented Nov 19, 2018

justaugustus commented Nov 20, 2018

justaugustus commented Dec 1, 2018

k8s-ci-robot commented Dec 1, 2018

easeway Oct 23, 2018 •

edited

Loading

liggitt Oct 23, 2018 •

edited

Loading

liggitt Oct 23, 2018 •

edited

Loading

liggitt Oct 23, 2018 •

edited

Loading