Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP: Namespace Initializer #2543

Closed
wants to merge 1 commit into from

Conversation

easeway
Copy link

@easeway easeway commented Aug 15, 2018

This KEP proposes a specialized initializer mechanism for namespaces, based on the discussion on #2177

I submit this proposal because the generic Initializer is planned to be deprecated, however the mechanism is very useful for container resources like namespace, and I hope we can keep this mechanism, at least specialize it and move it towards beta/GA. If the generic Initializer will move on to beta/GA, I will abandon this proposal.

Note: this proposal requires changes in core Kubernetes, and adding a new field in stable API object.

/sig architecture
/sig api-machinery
/sig auth

/cc @bgrant0607
/cc @smarterclayton
/cc @jbeda
/cc @lavalamp
/cc @liggitt
/cc @tallclair
/cc @davidopp

@k8s-ci-robot k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 15, 2018
@k8s-ci-robot
Copy link
Contributor

Hi @easeway. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 15, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: jbeda

If they are not already assigned, you can assign the PR to them by writing /assign @jbeda in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@easeway easeway force-pushed the kep-namespace-initializer branch from 26d7797 to 3b26a6b Compare August 15, 2018 18:24
@smarterclayton
Copy link
Contributor

smarterclayton commented Aug 15, 2018 via email

@lavalamp
Copy link
Member

Added to my queue.

Initial question just based on your comment--how confident are we that namespace will always be the only container resource? When I started writing this sentence I thought 80% but by the end I'm thinking 60%.

@smarterclayton
Copy link
Contributor

I'm not sure. Everyone talks a big game about a new container resource, but in 4 years nothing has gotten close.

#### Core Kubernetes Changes

The generic Initializers uses `metadata.Initializers` to record all pending initializations.
As this is going to be deprecated, for namespaces, `Initializers` is added to `NamespaceSpec` as an _alpha_ field:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually prefer to leave it in metadata.

I think it'd be OK/interesting to define a ContainerMetadata type which includes this and use it only for namespaces.

Even if namespace is the only container type k8s ever has, I think initializers still need to be understood by the machinery in many places and therefore spec not a good choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the rest of the proposal, I'm on the fence about this; I'll leave it here as a discussion point. I think the doc could maybe explain why it wasn't chosen in an alternatives section?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decided to put that in metadata, I will be happy with that. And we can actually keep the existing Initializers and move it forward. I'd be happy to abandon this PR if we can move Initializers forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if namespace is the only container type k8s ever has, I think initializers still need to be understood by the machinery in many places and therefore spec not a good choice.

The way this is designed, I don't think initialization is a generic concept. I think it is specific to namespaces as the only scoping resource we have. I'd rather have metadata without initializers and the field in the spec of the namespace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite ready to give up on the idea that we might eventually gain another scoping resource. We could make a ContainerMetadata which is ObjectMetadata (inlined) + initializers; that way we wouldn't impact other resources.

Copy link
Author

@easeway easeway Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lavalamp IMO ContainerMetadata is a good idea, if we can foresee namespace is not the only container resource. I will put it in the proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In V1 I would much rather have concrete intitializers on namespace aligned with a concrete finalizers field which won’t change for v1, and worry about hypothetical other initializers added as case may come up.

The current initializers mechanism lacks the ability to allow resources to have their own initialization flow, and at the limit initialization is a domain problem (for instance, a job graph) which shouldn’t be reduced to a generic field.

// Finalizers is an opaque list of values that must be empty to permanently remove object from storage.
// More info: https://kubernetes.io/docs/tasks/administer-cluster/namespaces/
// +optional
Finalizers []FinalizerName `json:"finalizers,omitempty" protobuf:"bytes,1,rep,name=finalizers,casttype=FinalizerName"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finalizers is already in metadata, why would it be in spec, too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The v1 Namespace today has Finalizers in spec. This PR is opened in the assumption the generic Initializers is going to be removed from metadata, so specifically for Namespace, put both Initializers and Finalizers in spec to be more consistent. If we decided to keep Initializers in metadata, I agree we should use Finalizers in metadata too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finalizers is already in metadata, why would it be in spec, too?

namespaces predate "normal" finalizers, so this field is how namespaces do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do it wrong and should change in v2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the scope to adding fields to v1 Namespace objects, not making a v2 namespace which will take longer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We’ve said consistency within an API trumps “doing it the way we wanted to do it” for keeping users on a clear path. I think this is just another example of that.

@justaugustus
Copy link
Member

/kind kep

Pending []NamespaceInitializer `json:"pending" protobuf:"bytes,1,rep,name=pending" patchStrategy:"merge" patchMergeKey:"name"`
// If result is set with the Failure field, the object will be persisted to storage and then deleted,
// ensuring that other clients can observe the deletion.
Result *metav1.Status `json:"result,omitempty" protobuf:"bytes,2,opt,name=result"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a status condition to me

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This can be expressed in status condition too. As the design is copied from Initializers without change, I'd prefer to keep it as is, and make the scope small. We can evolve later if status condition will be a better place.


```go
// NamespaceInitializerConfiguration describes the configuration of initializers.
type NamespaceInitializerConfiguration struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should actually be a normal admission plugin configuration done via file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we have a file based configuration? Those are hard to manipulate at runtime. I much prefer API resources absent some sort of justification (e.g., security).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using file based configuration is less flexible as it requires the configuration during cluster bootstrapping and changes will require restarting of control plane.


- The namespace initialization controllers register themselves by creating `NamespaceInitializerConfiguration` resources
- When a namespace is newly created
- The `NamespaceInitializers` admission controller aggregates all `NamespaceInitializersConfiguration` resources and generate `spec.Initializers` in this namespace resource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't we tell people to use a webhook admission plugin to add themselves to the list of initializers for a namespace?

Copy link
Member

@liggitt liggitt Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do namespaces decide whether they are subject to being run through an admission plugin w.r.t. NamespaceSelector?

it gets a little weird to use this to hook in policy when the thing used to select whether it applies to a namespace is an attribute of the incoming object:

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/admission/plugin/webhook/namespace/matcher.go#L89-L97

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC namespace objects have their own labels tested before running through the admission controller, for better or worse.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k Yes. We can definitely tell people to use a webhook admission plugin. However, adding a webhook admission plugin requires additional work, including setting up a webserver with TLS certificates, creating webhook configurations, exposing services, etc. And the stability of webhook plugins also impacts the request in API path. With NamespaceInitializersConfiguration saves these extra work that people only need to write a controller. It also helps we can focus on the stability of NamespaceInitializers admin controller.

This design is completely copied from existing Initializers: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#initializers, and IMO the original design is good as it offers a simplified configuration in additional to writing own admission webhook plugin.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt Not sure if I understand the comment correctly. The NamespaceInitializers admission controller runs through all namespaces without filtering. And currently there's no filtering/selection mechanism in NamespaceInitializerConfiguration. That means, they are applied to all newly created namespaces, and leave to the actual controllers performing initialization work to decide what to do with a particular namespace.

Copy link
Member

@liggitt liggitt Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the primary use case given (self-service namespace creation/population - #2177), a webhook is still required to capture the info about the creating user, right? the initializer name populated in the namespace doesn't have enough information for auto-policy-grants to know who to grant permissions to

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the particular use case in #2177. I'd love to separate this problem from this PR and discuss that in #2177. However, one of the major dependencies on this PR is that, we need this initialization mechanism to make sure the compatibility of API behavior, that an existing CLI/script won't break when namespace is being initialized. And that's also the reason we need this change in core Kubernetes. There's no other way to achieve that, e.g. a webhook admission plugin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k Yes. We can definitely tell people to use a webhook admission plugin. However, adding a webhook admission plugin requires additional work, including setting up a webserver with TLS certificates, creating webhook configurations, exposing services, etc. And the stability of webhook plugins also impacts the request in API path. With NamespaceInitializersConfiguration saves these extra work that people only need to write a controller. It also helps we can focus on the stability of NamespaceInitializers admin controller.

This design is trying to avoid that at the cost of adding additional complexity to core. In addition, doing it this way restricts options that admission plugins would have about precisely when and how particular initializers are applied. Using a webhook, I'm sure that the openshift use-cases could be covered. Using what's described here, I'm not guaranteed that it would be.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about precisely when and how particular initializers are applied

Is that specifically talking about the order of initializers? If yes, using Webhook admission plugin (I mean MutatingAdmissionWebhook) doesn't guarantee the order. Actually they are the same: creating MutatingWebhookConfiguration vs creating NamespaceInitializerConfiguration. And it's really difficult to define/enforce the ordering unless the webhook plugins (or initializers) are clearly predefined during cluster bootstrapping time and disallow dynamic deployment of arbitrary webhooks/initializers.

Anyway, with NamespaceInitializerConfiguration, it doesn't block the use of webhook admission plugin. It's still possible/reasonable to use webhook admission plugin depending on a particular use case. However we did hear the complain about the extra work/complexity to build a webhook while the developer only want a controller to populate a namespace when it's being initialized. That demonstrates the value of NamespaceInitializerConfiguration.

Alternatively, I'm open to not implementing NamespaceInitializerConfiguration at the beginning, and only focus on initialization mechanism, and with that alone, webhook plugin is required to add initializers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to have an easy to consume initializer impl as part of controller-runtime that someone can quickly iterate on.

@easeway easeway force-pushed the kep-namespace-initializer branch from 3b26a6b to 7c62fa5 Compare October 23, 2018 18:53
- Once the namespace resource is persisted, the following happens in parallel:
- Namespace initialization controllers get notified about the newly created namespace,
and they start creating resources inside namespace and also update `spec.Initializers` to remove themselves once the initialization is done
- The storage layer will wait until `spec.Initializers` becomes empty before returning the response back to client, unless `includeUninitialized` is true in the request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding the creation request until async initializers observe and act on the object being created leads to a lot of complexity. In the original design, it was paired with the includeUninitialized option on list/watch (which was also extremely complex), in order to prevent controllers from acting on uninitialized objects (like running a pod or provisioning a service load balancer before it was initialized).

I'm not sure the same requirements exist for container-type objects (a namespace doesn't "do" anything... it's not a workload that could be prematurely executed, etc). I'm not eager to attempt introducing the "exists but is hidden unless you pass a special flag" behavior again if we don't have to.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand holding the creation request is complex and using includeUninitialized can be problematic. However, this seems to be the only way to ensure behavioral compatibility with existing client code - which assumes the readiness of a resource upon receiving the creation response. That's also a reason to introduce status condition in namespace resource, with which we can gradually improve the client code (at least client-go and kubectl) to determine the readiness of a resource based on status condition.

For example:

apiVersion: v1
kind: Namespace
metadata:
  name: ns1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: ns1
  name: app1
...

There are a lot of usage of above manifest with a single kubectl apply -f. It will break without holding the namespace creation request if namespace is not ready because of being initialized by some controller while the creation response reaches kubectl. For example, in self-service namespace creation, the creator may only have the permission to create a namespace, and waits until NamespaceTemplate to populate admin permission inside the namespace. The single kubectl apply will break without initialization mechanism.

I think a reasonable plan will be still including the mechanism of includeUninitialized flag to make sure nothing breaks. And later deprecate this complex logic when most of the clients are patched and working with status condition (this will take significant long time, probably a few years).

// An initializer is a controller which enforces some system invariant at object creation time.
// This field is a list of initializers that have not yet acted on this object. If nil or empty,
// this object has been completely initialized. Otherwise, the object is considered uninitialized
// and is hidden (in list/watch and get calls) from clients that haven't explicitly asked to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is hidden (in list/watch and get calls) from clients that haven't explicitly asked to observe uninitialized objects.

this proved very problematic when implementing initializers before (and we never successfully resolved the issues)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it's problematic. Please see my comments above for a plan to eventually fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

independent of the outcome of the hold-until-initialized behavior during creation, I don't understand why we need to modify get/list/watch this way for container objects (objects that don't have behavior associated with them). can you elaborate on what use cases the get/list/watch changes enable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about the concurrency issue.

In general, one user is creating a container object, before the container object is fully initialized (e.g. populating policy objects etc.), another user (or controller) is get/list/watch this kind of container objects. Without hiding, they may get notified about the creation of this container object, and they will move forward immediately assuming this new container object is ready for use.

Here's a more specific use case:

In a corp, the cluster admin team is sharing a cluster with multiple development teams. The cluster admin team setup namespace population mechanism to populate common policy objects into all namespaces. The develop team A want to enforce specific policies according to the needs of their own project, for example ResourceQuota on testing Pods. They created a namespace for running their automation pipeline which includes a controller watching namespaces created for sub-teams and creating ResourceQuota in each namespaces.

In this case, when team lead creates a namespace for a sub-team, before namespace population fully initializes the namespace with policies defined by cluster admin team, the controller enforcing ResourceQuota should not be notified about the readiness of the namespace, as RBAC rules may not be setup for service account running the controller. This demonstrates the need of "includeUninitialized" flag for "get/list/watch" besides "create".

Copy link
Member

@liggitt liggitt Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the controller enforcing ResourceQuota should not be notified about the readiness of the namespace, as RBAC rules may not be setup for service account running the controller

I don't think that is particularly convincing, or justifies the complexity of hiding objects from list/watch. A controller that needs to react to populate a "container" object should be able to retry/backoff. The primary reasons for hiding uninitialized objects from listers/watchers were around "point of no return"-type operations like starting a pod or actuating something described in an object's spec, and wanting to prevent that from happening until the object was initialized.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accept your point and agree that hiding is not necessary for "get/list/watch" specifically for container resources as long as status conditions are available. Because there's no behavior associated with container resources, and thus no "point of no return" operations. The clients can always observe status condition to decide the readiness of the resource.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which all namespace consumers already have to do today (service accounts, quota)

@easeway easeway force-pushed the kep-namespace-initializer branch from 7c62fa5 to cd160fd Compare October 23, 2018 20:42
@easeway easeway force-pushed the kep-namespace-initializer branch from cd160fd to 14d2614 Compare November 19, 2018 19:59
@easeway
Copy link
Author

easeway commented Nov 19, 2018

Updated the KEP according to Oct 24, 2018 sig API machinery meeting:

  • Removed the mechanism holding the response;
  • Added a section discussing about security: a proposal using initialize verb in RBAC.

@justaugustus
Copy link
Member

REMINDER: KEPs are moving to k/enhancements on November 30. Please attempt to merge this KEP before then to signal consensus.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.

@justaugustus
Copy link
Member

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

@k8s-ci-robot
Copy link
Contributor

@justaugustus: Closed this PR.

In response to this:

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants