-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP: Namespace Initializer #2543
Conversation
Hi @easeway. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
26d7797
to
3b26a6b
Compare
@lavalamp this originated from the initializer proposal being considered as
a "not sufficiently useful mechanism for anything *other* than
namespaces". We ultimately have two options:
1. have a generic initializer field for all resources that isn't tied to a
global mechanism, but is instead per resource
2. have an initializer field specifically JUST for namespaces, because
namespaces have a unique and novel lifecycle as the only "container"
resource we have
I could go either way, but 2 seems more appropriate in the near term than
keeping a generic field with a generic structure in place.
…On Wed, Aug 15, 2018 at 2:21 PM k8s-ci-robot ***@***.***> wrote:
[APPROVALNOTIFIER] This PR is *NOT APPROVED*
This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: *jbeda*
If they are not already assigned, you can assign the PR to them by writing /assign
@jbeda in a comment when ready.
The full list of commands accepted by this bot can be found here
<https://go.k8s.io/bot-commands>.
The pull request process is described here
<https://git.k8s.io/community/contributors/guide/owners.md#the-code-review-process>
Needs approval from an approver in each of these files:
- *keps/OWNERS
<https://github.com/kubernetes/community/blob/master/keps/OWNERS>*
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#2543 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p04rRRpjf5B-ywiN_NjN0qYqI57sks5uRGaYgaJpZM4V-kiI>
.
|
Added to my queue. Initial question just based on your comment--how confident are we that namespace will always be the only container resource? When I started writing this sentence I thought 80% but by the end I'm thinking 60%. |
I'm not sure. Everyone talks a big game about a new container resource, but in 4 years nothing has gotten close. |
#### Core Kubernetes Changes | ||
|
||
The generic Initializers uses `metadata.Initializers` to record all pending initializations. | ||
As this is going to be deprecated, for namespaces, `Initializers` is added to `NamespaceSpec` as an _alpha_ field: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually prefer to leave it in metadata.
I think it'd be OK/interesting to define a ContainerMetadata type which includes this and use it only for namespaces.
Even if namespace is the only container type k8s ever has, I think initializers still need to be understood by the machinery in many places and therefore spec not a good choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading the rest of the proposal, I'm on the fence about this; I'll leave it here as a discussion point. I think the doc could maybe explain why it wasn't chosen in an alternatives section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decided to put that in metadata, I will be happy with that. And we can actually keep the existing Initializers and move it forward. I'd be happy to abandon this PR if we can move Initializers forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if namespace is the only container type k8s ever has, I think initializers still need to be understood by the machinery in many places and therefore spec not a good choice.
The way this is designed, I don't think initialization is a generic concept. I think it is specific to namespaces as the only scoping resource we have. I'd rather have metadata without initializers and the field in the spec of the namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite ready to give up on the idea that we might eventually gain another scoping resource. We could make a ContainerMetadata which is ObjectMetadata (inlined) + initializers; that way we wouldn't impact other resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lavalamp IMO ContainerMetadata is a good idea, if we can foresee namespace is not the only container resource. I will put it in the proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In V1 I would much rather have concrete intitializers on namespace aligned with a concrete finalizers field which won’t change for v1, and worry about hypothetical other initializers added as case may come up.
The current initializers mechanism lacks the ability to allow resources to have their own initialization flow, and at the limit initialization is a domain problem (for instance, a job graph) which shouldn’t be reduced to a generic field.
// Finalizers is an opaque list of values that must be empty to permanently remove object from storage. | ||
// More info: https://kubernetes.io/docs/tasks/administer-cluster/namespaces/ | ||
// +optional | ||
Finalizers []FinalizerName `json:"finalizers,omitempty" protobuf:"bytes,1,rep,name=finalizers,casttype=FinalizerName"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finalizers is already in metadata, why would it be in spec, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The v1 Namespace today has Finalizers in spec. This PR is opened in the assumption the generic Initializers is going to be removed from metadata, so specifically for Namespace, put both Initializers and Finalizers in spec to be more consistent. If we decided to keep Initializers in metadata, I agree we should use Finalizers in metadata too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finalizers is already in metadata, why would it be in spec, too?
namespaces predate "normal" finalizers, so this field is how namespaces do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do it wrong and should change in v2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the scope to adding fields to v1 Namespace objects, not making a v2 namespace which will take longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We’ve said consistency within an API trumps “doing it the way we wanted to do it” for keeping users on a clear path. I think this is just another example of that.
/kind kep |
Pending []NamespaceInitializer `json:"pending" protobuf:"bytes,1,rep,name=pending" patchStrategy:"merge" patchMergeKey:"name"` | ||
// If result is set with the Failure field, the object will be persisted to storage and then deleted, | ||
// ensuring that other clients can observe the deletion. | ||
Result *metav1.Status `json:"result,omitempty" protobuf:"bytes,2,opt,name=result"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a status condition to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This can be expressed in status condition too. As the design is copied from Initializers without change, I'd prefer to keep it as is, and make the scope small. We can evolve later if status condition will be a better place.
|
||
```go | ||
// NamespaceInitializerConfiguration describes the configuration of initializers. | ||
type NamespaceInitializerConfiguration struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should actually be a normal admission plugin configuration done via file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we have a file based configuration? Those are hard to manipulate at runtime. I much prefer API resources absent some sort of justification (e.g., security).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using file based configuration is less flexible as it requires the configuration during cluster bootstrapping and changes will require restarting of control plane.
|
||
- The namespace initialization controllers register themselves by creating `NamespaceInitializerConfiguration` resources | ||
- When a namespace is newly created | ||
- The `NamespaceInitializers` admission controller aggregates all `NamespaceInitializersConfiguration` resources and generate `spec.Initializers` in this namespace resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't we tell people to use a webhook admission plugin to add themselves to the list of initializers for a namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do namespaces decide whether they are subject to being run through an admission plugin w.r.t. NamespaceSelector
?
it gets a little weird to use this to hook in policy when the thing used to select whether it applies to a namespace is an attribute of the incoming object:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC namespace objects have their own labels tested before running through the admission controller, for better or worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k Yes. We can definitely tell people to use a webhook admission plugin. However, adding a webhook admission plugin requires additional work, including setting up a webserver with TLS certificates, creating webhook configurations, exposing services, etc. And the stability of webhook plugins also impacts the request in API path. With NamespaceInitializersConfiguration
saves these extra work that people only need to write a controller. It also helps we can focus on the stability of NamespaceInitializers admin controller.
This design is completely copied from existing Initializers: https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#initializers, and IMO the original design is good as it offers a simplified configuration in additional to writing own admission webhook plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liggitt Not sure if I understand the comment correctly. The NamespaceInitializers
admission controller runs through all namespaces without filtering. And currently there's no filtering/selection mechanism in NamespaceInitializerConfiguration
. That means, they are applied to all newly created namespaces, and leave to the actual controllers performing initialization work to decide what to do with a particular namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the primary use case given (self-service namespace creation/population - #2177), a webhook is still required to capture the info about the creating user, right? the initializer name populated in the namespace doesn't have enough information for auto-policy-grants to know who to grant permissions to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for the particular use case in #2177. I'd love to separate this problem from this PR and discuss that in #2177. However, one of the major dependencies on this PR is that, we need this initialization mechanism to make sure the compatibility of API behavior, that an existing CLI/script won't break when namespace is being initialized. And that's also the reason we need this change in core Kubernetes. There's no other way to achieve that, e.g. a webhook admission plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k Yes. We can definitely tell people to use a webhook admission plugin. However, adding a webhook admission plugin requires additional work, including setting up a webserver with TLS certificates, creating webhook configurations, exposing services, etc. And the stability of webhook plugins also impacts the request in API path. With
NamespaceInitializersConfiguration
saves these extra work that people only need to write a controller. It also helps we can focus on the stability of NamespaceInitializers admin controller.
This design is trying to avoid that at the cost of adding additional complexity to core. In addition, doing it this way restricts options that admission plugins would have about precisely when and how particular initializers are applied. Using a webhook, I'm sure that the openshift use-cases could be covered. Using what's described here, I'm not guaranteed that it would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about precisely when and how particular initializers are applied
Is that specifically talking about the order of initializers? If yes, using Webhook admission plugin (I mean MutatingAdmissionWebhook) doesn't guarantee the order. Actually they are the same: creating MutatingWebhookConfiguration
vs creating NamespaceInitializerConfiguration
. And it's really difficult to define/enforce the ordering unless the webhook plugins (or initializers) are clearly predefined during cluster bootstrapping time and disallow dynamic deployment of arbitrary webhooks/initializers.
Anyway, with NamespaceInitializerConfiguration
, it doesn't block the use of webhook admission plugin. It's still possible/reasonable to use webhook admission plugin depending on a particular use case. However we did hear the complain about the extra work/complexity to build a webhook while the developer only want a controller to populate a namespace when it's being initialized. That demonstrates the value of NamespaceInitializerConfiguration
.
Alternatively, I'm open to not implementing NamespaceInitializerConfiguration
at the beginning, and only focus on initialization mechanism, and with that alone, webhook plugin is required to add initializers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we want to have an easy to consume initializer impl as part of controller-runtime that someone can quickly iterate on.
3b26a6b
to
7c62fa5
Compare
- Once the namespace resource is persisted, the following happens in parallel: | ||
- Namespace initialization controllers get notified about the newly created namespace, | ||
and they start creating resources inside namespace and also update `spec.Initializers` to remove themselves once the initialization is done | ||
- The storage layer will wait until `spec.Initializers` becomes empty before returning the response back to client, unless `includeUninitialized` is true in the request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holding the creation request until async initializers observe and act on the object being created leads to a lot of complexity. In the original design, it was paired with the includeUninitialized
option on list/watch (which was also extremely complex), in order to prevent controllers from acting on uninitialized objects (like running a pod or provisioning a service load balancer before it was initialized).
I'm not sure the same requirements exist for container-type objects (a namespace doesn't "do" anything... it's not a workload that could be prematurely executed, etc). I'm not eager to attempt introducing the "exists but is hidden unless you pass a special flag" behavior again if we don't have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand holding the creation request is complex and using includeUninitialized
can be problematic. However, this seems to be the only way to ensure behavioral compatibility with existing client code - which assumes the readiness of a resource upon receiving the creation response. That's also a reason to introduce status condition in namespace resource, with which we can gradually improve the client code (at least client-go and kubectl) to determine the readiness of a resource based on status condition.
For example:
apiVersion: v1
kind: Namespace
metadata:
name: ns1
---
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: ns1
name: app1
...
There are a lot of usage of above manifest with a single kubectl apply -f
. It will break without holding the namespace creation request if namespace is not ready because of being initialized by some controller while the creation response reaches kubectl. For example, in self-service namespace creation, the creator may only have the permission to create
a namespace, and waits until NamespaceTemplate
to populate admin
permission inside the namespace. The single kubectl apply
will break without initialization mechanism.
I think a reasonable plan will be still including the mechanism of includeUninitialized
flag to make sure nothing breaks. And later deprecate this complex logic when most of the clients are patched and working with status condition (this will take significant long time, probably a few years).
// An initializer is a controller which enforces some system invariant at object creation time. | ||
// This field is a list of initializers that have not yet acted on this object. If nil or empty, | ||
// this object has been completely initialized. Otherwise, the object is considered uninitialized | ||
// and is hidden (in list/watch and get calls) from clients that haven't explicitly asked to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is hidden (in list/watch and get calls) from clients that haven't explicitly asked to observe uninitialized objects.
this proved very problematic when implementing initializers before (and we never successfully resolved the issues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it's problematic. Please see my comments above for a plan to eventually fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
independent of the outcome of the hold-until-initialized behavior during creation, I don't understand why we need to modify get/list/watch this way for container objects (objects that don't have behavior associated with them). can you elaborate on what use cases the get/list/watch changes enable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's about the concurrency issue.
In general, one user is creating a container object, before the container object is fully initialized (e.g. populating policy objects etc.), another user (or controller) is get/list/watch this kind of container objects. Without hiding, they may get notified about the creation of this container object, and they will move forward immediately assuming this new container object is ready for use.
Here's a more specific use case:
In a corp, the cluster admin team is sharing a cluster with multiple development teams. The cluster admin team setup namespace population mechanism to populate common policy objects into all namespaces. The develop team A want to enforce specific policies according to the needs of their own project, for example ResourceQuota on testing Pods. They created a namespace for running their automation pipeline which includes a controller watching namespaces created for sub-teams and creating ResourceQuota in each namespaces.
In this case, when team lead creates a namespace for a sub-team, before namespace population fully initializes the namespace with policies defined by cluster admin team, the controller enforcing ResourceQuota should not be notified about the readiness of the namespace, as RBAC rules may not be setup for service account running the controller. This demonstrates the need of "includeUninitialized" flag for "get/list/watch" besides "create".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the controller enforcing ResourceQuota should not be notified about the readiness of the namespace, as RBAC rules may not be setup for service account running the controller
I don't think that is particularly convincing, or justifies the complexity of hiding objects from list/watch. A controller that needs to react to populate a "container" object should be able to retry/backoff. The primary reasons for hiding uninitialized objects from listers/watchers were around "point of no return"-type operations like starting a pod or actuating something described in an object's spec, and wanting to prevent that from happening until the object was initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I accept your point and agree that hiding is not necessary for "get/list/watch" specifically for container resources as long as status conditions are available. Because there's no behavior associated with container resources, and thus no "point of no return" operations. The clients can always observe status condition to decide the readiness of the resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which all namespace consumers already have to do today (service accounts, quota)
7c62fa5
to
cd160fd
Compare
cd160fd
to
14d2614
Compare
Updated the KEP according to Oct 24, 2018 sig API machinery meeting:
|
REMINDER: KEPs are moving to k/enhancements on November 30. Please attempt to merge this KEP before then to signal consensus. Any questions regarding this move should be directed to that thread and not asked on GitHub. |
KEPs have moved to k/enhancements. Any questions regarding this move should be directed to that thread and not asked on GitHub. |
@justaugustus: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This KEP proposes a specialized initializer mechanism for namespaces, based on the discussion on #2177
I submit this proposal because the generic Initializer is planned to be deprecated, however the mechanism is very useful for container resources like namespace, and I hope we can keep this mechanism, at least specialize it and move it towards beta/GA. If the generic Initializer will move on to beta/GA, I will abandon this proposal.
Note: this proposal requires changes in core Kubernetes, and adding a new field in stable API object.
/sig architecture
/sig api-machinery
/sig auth
/cc @bgrant0607
/cc @smarterclayton
/cc @jbeda
/cc @lavalamp
/cc @liggitt
/cc @tallclair
/cc @davidopp