Skip to content
This repository has been archived by the owner on Oct 7, 2020. It is now read-only.

Prune Istio Namespace created by Operator when deleting Operator cr #484

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

irisdingbj
Copy link
Member

@irisdingbj irisdingbj commented Oct 30, 2019

istio/istio#18384

  • Do not update Istio namespace if the namespace is not created by Istio Operator

  • Prune Istio Namespace if it is created by Istio Operator

  • Leave Istio namespace if it is not created by Istio Operator

@irisdingbj irisdingbj requested a review from a team as a code owner October 30, 2019 08:32
@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Oct 30, 2019
@istio-testing istio-testing added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 30, 2019
@irisdingbj
Copy link
Member Author

istio/istio#18384

if listenerErr != nil {
log.Errorf("unexpected error occurred invoking ResourceError on listener: %s", listenerErr)
} else if update {
if patch, err = h.CreatePatch(receiver, mutatedObj); err == nil && patch != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe it's better to merge the two if condition checks.

@@ -40,6 +40,17 @@ func (h *HelmReconciler) Prune(all bool) error {
if err != nil {
allErrors = append(allErrors, err)
}
//prune namespace created by istio operator at last
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a dangerous operation since it will delete anything that the user created in the namespace too. At the very least, we should create a flag to guard this - but IMO it may be better to leave this step up to the user since it's not very onerous. Thoughts @howardjohn @sdake ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not as familiar with common operator patterns, but it seems dangerous to me. I think we should error on the side of NOT deleting stuff - will a user be more upset if they have an empty namespace around, or if we delete their critical applications?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, we shouldn't be creating or deleting namespaces. If we are creating namespaces, access should be restricted as, by definition, the operator is managing the lifecycle of that namespace.

Typically, an operator will create resources in the same namespace as the CR defining the install/behavior or in a completely segregated namespace, where all interaction would be through the CR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, at the minimum this should be guarded with a flag... like you see a confirmation - do you really want to delete this github repo?

Also, how do you know if the namespace isn't there prior to install of the operator?

Copy link
Member Author

@irisdingbj irisdingbj Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the namespace is created by operator, we will mark it via adding related operator lables and annotations, eg: install.operator.istio.io/owner-generation.

If the namespace is created prior to install of the operator, this PR will make sure operator will not touch the namespace(not adding related labels and annotations)

So in this way, we can know whether the namespace is created by operator or not. and we delete them only if it is created by operator.

Copy link
Member

@sdake sdake Oct 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, we shouldn't be creating or deleting namespaces. If we are creating namespaces, access should be restricted as, by definition, the operator is managing the lifecycle of that namespace.

Typically, an operator will create resources in the same namespace as the CR defining the install/behavior or in a completely segregated namespace, where all interaction would be through the CR.

Assuming the Istio CR is registered in the istio-operator namespace Is this not handled by https://github.com/istio/operator/blob/master/deploy/operator.yaml#L33-L34? If the answer is yes, perhaps there is some way to enforce the exact behavior we want as you have specified.

@irisdingbj
Copy link
Member Author

@ostromart @howardjohn @rcernich
I agree the comments that we should not delete the namespace if it is created by user.

But if the namespace is created by our operator. It means operator need manage the lifecycle of the namespace as said by @rcernich

So this is what the PR want to do:

Delete the ns if it is create by operator.
Leave the ns untouched(not add related operator label and annotation) and do not delete the ns if it is not created by operator.

mutatedObj, err = patch.Apply()
if err == nil {
if err = h.customizer.Listener().ResourceUpdated(mutatedObj, receiver); err != nil {
log.Errorf("unexpected error occurred during postprocessing of updated resource: %s", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In very overloaded clusters where the resource request is not processed, simply logging the Error and moving on is not sufficient.

} else {
listenerErr := h.customizer.Listener().ResourceError(obj, err)
if listenerErr != nil {
log.Errorf("unexpected error occurred invoking ResourceError on listener: %s", listenerErr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same problem with error processing. There are other instances of this poor error handling in this function that you didn't add. Perhaps you can get it right in your changes and return with a followup PR to address the problems in the remainder of this code?

Cheers
-steve

@sdake
Copy link
Member

sdake commented Oct 31, 2019

/test gencheck_operator

@sdake
Copy link
Member

sdake commented Oct 31, 2019

@ostromart @howardjohn @rcernich
I agree the comments that we should not delete the namespace if it is created by user.

But if the namespace is created by our operator. It means operator need manage the lifecycle of the namespace as said by @rcernich

So this is what the PR want to do:

Delete the ns if it is create by operator.
Leave the ns untouched(not add related operator label and annotation) and do not delete the ns if it is not created by operator.

Agree. Objects created by automation should be deleted by said automation. The objection is the side effect of deleting a namespace could have unintended consequences of deleting objects erroneously added by a user in the deployed namespace.

This should be simple enough to control given the manifest we ship and the WATCH_NAMESPACE environment variable. If a CR is registered in this WATCH_NAMESPACE and the CR's https://github.com/istio/operator/blob/master/pkg/apis/istio/v1alpha2/istiocontrolplane_types.proto#L186 is set and a namespace is created setting appropriate annotations and labels indicating the namespace was created by the operator, the user no longer has an expectation of data stored within this namespace to be retained.

Cheers
-steve

@rcernich
Copy link
Contributor

Regarding the namespaces, we shouldn't be creating them at all. We should verify its existence in the validator and reject the CR if the target namespace does not exist or if the user creating the CR does not have permission to access the specified namespace. I believe this could be a potential security issue, e.g. if a user specifies a rogue image along and requests installation into a system namespace. We should really reconsider the current approach.

@howardjohn
Copy link
Member

+1 to @rcernich shouldn't Istio be installed in the same namespace as the ICP, which implies the namespace already exists?

@ostromart
Copy link
Contributor

We need to create the namespace to accommodate a single line install. There's just no way around it.
The current check in the code is insufficient - what if a user had already created the ns before installing the operator? Either we check for that condition and not delete the ns after, or (better imo) leave the ns alone when deleting. It's such a trivial step, I don't see it as necessary to take the risk or introduce extra code to do it.

@rcernich
Copy link
Contributor

@ostromart the issue is that the user shouldn't be specifying the namespace at all when using the operator. The namespace within which the CR is created is where the operator should create the resources. The operator use case is different from the command-line use case, where the cli is creating resources which need to be created/applied separately.

@ostromart
Copy link
Contributor

@howardjohn there are good reasons for keeping the operator and ICP in a separate ns from Istio, namely creating an administrative boundary and making it easy to uninstall operator without uninstalling Istio. RH take is a step further in their model by having the ICP in a completely different ns. We support that through a flag but currently the flag is set to the operator ns.
The operator install currently creates the istio-operator ns. IMO we should not delete this ns either when operator in removed because a user may be transitioning from using the controller to using istioctl manifest apply, in which case they will want to preserve their ICP CR and use that as an input to istioctl.

@rcernich
Copy link
Contributor

@ostromart, the operator can watch any namespace and should be watching all namespaces. When an ICP is creating in a namespace, that's the namespace that should contain the control plane. The operator should be completely isolated and regular users should be able to create control planes. Typical configuration scenarios for an operator are:

  • Only watch operator namespace (not an option for Istio because of the perms required by the operator)
  • Watch a single namespace (other than operator namespace)
  • Watch a set of namespaces (difficult to implement in practice)
  • Watch all namespaces

The operator install should not create the namespace. The flow should be:

  1. create operator namespace
  2. kubectl apply -n operator-namespace -f operator.yaml

Removal should be: kubectl delete -n operator-namespace -f operator.yaml

The goal: user must be cluster-admin to install operator, but any user should be able to install a control plane.

@irisdingbj
Copy link
Member Author

irisdingbj commented Nov 1, 2019

@ostromart @sdake @howardjohn @rcernich @linsun
Thanks guys for your comments here.

SO we have two namespace here: the operator namespace and the istio installation namespace(eg: istio-system).

I agree we should keep the operator namespace untouched.

Now we will add operator labels and annotations into the istio installation namespace(eg: istio-system) even if it exists before hand.

So we should at least make the logic in pkg/helmreconciler/rendering.go within this PR go into master code. -- This is logic to make sure operator will not touch the pre-existed Istio installation namespace.

For whether we need prune the istio installation namespace(eg:istio-system) and operator namespace, Let's open an issue and move our discussion there to get a solid agreements.

Does this sounds reasonable?

@rcernich
Copy link
Contributor

rcernich commented Nov 1, 2019

The user installing the control plane should create the istio-system namespace. They should create the ICP resource in istio-system. (Replace istio-system with any namespace the user wants to use for installation.)

@ostromart
Copy link
Contributor

Rob, what's the motivation for putting the ICP CR in istio-system? It's possible to create rbac that restricts a user from touching anything other than ICP CR in either ns.

@rcernich
Copy link
Contributor

rcernich commented Nov 1, 2019

Once again, ICP should sit beside the installed control plane. It should not be collocated with the operator itself. The ICP CR specifies the configuration for a control plane. The operator watches namespaces for ICP resources and act on it accordingly. Regular users should have no access to the operator namespace, especially because the operator service account has elevated privileges, which are required to instantiate resources across the cluster. A regular user should be able to create an ICP in a namespace they have access to, so they can install a control plane, without needing to be cluster-admin. This is the basic problem solved by the operator: normal users can manage a control plane installation.

Yes, RBAC can be used to restrict who can create/update/delete ICP CRs, but, as mentioned above, one of the benefits of using an operator is that a user does not need cluster-admin to administer a control plane installation.

@sdake
Copy link
Member

sdake commented Nov 1, 2019

Once again, ICP should sit beside the installed control plane. It should not be collocated with the operator itself. The ICP CR specifies the configuration for a control plane. The operator watches namespaces for ICP resources and act on it accordingly. Regular users should have no access to the operator namespace, especially because the operator service account has elevated privileges, which are required to instantiate resources across the cluster. A regular user should be able to create an ICP in a namespace they have access to, so they can install a control plane, without needing to be cluster-admin. This is the basic problem solved by the operator: normal users can manage a control plane installation.

Yes, RBAC can be used to restrict who can create/update/delete ICP CRs, but, as mentioned above, one of the benefits of using an operator is that a user does not need cluster-admin to administer a control plane installation.

@rcernich sounds consistent to me. Some use cases may wish to restrict who can install a control plane. Istio's control plane uses cluster scoped resources, so we must have the ability to restrict where ICPs can be registered in the model you propose. This more consistent approach doesn't really feel appropriate for a backport.

How can we solve this specific problem without major rework for 1.4?

Cheers
-steve

@rcernich
Copy link
Contributor

rcernich commented Nov 1, 2019

Hey @sdake, I'm not sure about 1.4, but based on my limited understand, we should reconfigure the operator deployment so that the operator watches all namespaces. As for users creating ICP resources, that will already be constrained based on RBAC, so admins will need to grant create/update/delete roles to individual users/groups. For folks used to doing everything as cluster-admin, the only difference they'd see is that they create the ICP resource in whatever namespace (e.g. istio-system). I don't know if the namespace fields in ICP are required, but if they're not, they should just be left blank. If not, the operator should balk if the namespace doesn't match the containing namespace and/or a validator checks the setting and rejects it. I'm guessing, of all this, all that needs to be done is modify the operator deployment yaml and override the target namespace with the namespace containing the ICP before rendering the charts.

@ostromart
Copy link
Contributor

Apologies in advance if I'm missing something obvious as security is not my forte. Just trying to understand your proposal better.
I have no objection to moving the CR out of istio-operator. In the code we have it's already straightforward to do so without any code changes if that's how a user wishes to deploy it. The bit I don't understand is why moving it into istio-system makes it more secure. If you have a model where the user is not allowed to make changes to the control plane except through the CR, and you don't find restricting access to only ICP CR through RBAC sufficiently secure, how does having the CR in istio-system solve the problem? Afaict you're just creating a problem that the user must now create istio-system so they can populate the CR - so you give up the one line install and delete.
I can see an argument where the CR lives in a completely different ns (say istio-operator-cr) and user only has access to that, but I'm missing why it should be in istio-system.

@rcernich
Copy link
Contributor

rcernich commented Nov 1, 2019

The operator namespace should be managed by cluster-admin, so moving ICP out of operator namespace allows control planes to be created with lower level permissions, although they would still need permissions to create/update/delete ICP resources. When using Operator Lifecycle Manager to install the operator, roles for managing the operator's associated CRDs are created along with the operator and bound into the standard admin/edit/view roles. This effectively means each namespace's administrators may instantiate an ICP resource, and thus their own control plane. This will become much more important when Istio supports multitenancy (similar to Maistra). It also becomes more important as the roles required by the components themselves are reduced.

Also, this is the general convention used when working with operators, although most of the k8s oriented examples seem to just dump everything into default namespace (which might be great for kicking tires, but falls short when working in a managed cluster).

HTH

@liminw
Copy link

liminw commented Nov 1, 2019

In general I agree with the principles @rcernich mentioned here. Operator is cluster/mesh admin, which has the privilege to read all ICPs from different namespaces. Each namespace admin is responsible for managing the resource in his own namespace.

From the examples we discussed, I only see "operator" and "istio-system" namespaces are mentioned. Both should be managed by cluster/mesh admins. Do we have use cases that the ICPs are namespace specific (hence should be set by namespace admins)?

@howardjohn
Copy link
Member

howardjohn commented Nov 1, 2019 via email

@rcernich
Copy link
Contributor

rcernich commented Nov 1, 2019

@liminw, two things:

  1. Service mesh admin need not be the same as cluster-admin.
  2. We (Red Hat) would like to contribute the work we did to support multitenancy in the Maistra project back upstream into Istio proper. This work includes a reduction in the scope of permissions assigned to Istio component service accounts, along with moving functionality requiring elevated privileges into the operator. This means that any user can install a service mesh and more than one service mesh can be installed in a cluster.

The second point is what is really driving the rationale behind this. The goal of the operator should be to allow self-provisioning within a managed k8s cluster. At present, this would mean that a service mesh admin would have greater privileges than a regular user, but would not have cluster-admin privileges.

HTH

@ostromart
Copy link
Contributor

The operator already supports the CR being in any namespace the admin doing the install chooses. I'm still wondering why that should be istio-system today, given that we're trying to restrict user privileges, rather than another ns altogether? Per Limin's point, istio-system should also be for admins. User should only be able to interact with Istio installation through the CR they control.

@jmazzitelli
Copy link
Member

istio-system should also be for admins. User should only be able to interact with Istio installation through the CR they control.

The user that creates the control plane namespace (e.g. istio-system) IS the "admin" (for that control plane/mesh). The idea is you have one operator listening/watching all namespaces. Bob creates/installs Istio via a CR he creates in his bob-istio-system and Mary creates/installs Istio via her own CR in mary-istio-system. Each is the admin for their own mesh - each has permissions to their own control plane namespace. They should never have permission to even see istio-operator namespace let alone create objects in that namespace.

@liminw
Copy link

liminw commented Nov 1, 2019

From @rcernich and @jmazzitelli 's comments, there are multiple control planes shared by different tenants, and all of them share the same operator. This requirement seems reasonable for such use case. I think we should follow @rcernich 's suggestion to allow the tenants to own their individual configuration.

@ostromart
Copy link
Contributor

I completely agree that for multi-tenant and/or managed case it makes sense not have the CR in istio-operator. I do still wonder why the cluster admin would install a managed Istio for a user and then afterwards allow them to interact directly with the control plane resources, rather than having a separate config namespace and having them do it solely through the CR. It seems difficult to support as a managed solution, but there's nothing preventing this in the operator - just listen to all namespaces. We can make both changes in a new PR.
In practical terms for this PR, I think at least everyone agrees that the operator should not delete the ns? Sorry @irisdingbj for hijacking your PR for this lengthy discussion...

@rcernich
Copy link
Contributor

rcernich commented Nov 1, 2019

@ostromart, cluster admin just installs the operater. A mesh admin (designated by cluster-admin) manages the control plane entirely: creates namespace, creates CR, etc. This mesh admin should be managing the control plane installation through the CR, not manually. The mesh admin should not be modifying the resources managed by the operator.

@jmazzitelli
Copy link
Member

and then afterwards allow them to interact directly with the control plane resources, rather than having a separate config namespace and having them do it solely through the CR.

Just to be clear, the tenant owner should NOT be allowed to interact directly with control plane resources that are created by the operator. I think the way you are thinking it should work, is how it works (minus the part about needing a separate config namespace). The tenant owner should configure things solely through the CR - this is true. The tenant owner does not interact directly with control plane resources. They will not (should not) have rights to do that - I believe this will be the job of the cluster-admin to provide the correct permissions to the tenant owners (right @rcernich?). But the tenant owner will have the rights to create, modify, and delete CRs. It should be fine to be able to put the CR directly in the control plane namespace.

So, in short (and @rcernich can explain the RBAC issues more in depth and correct me where I am wrong, but this is how I understand it) the tenant owner will not have the necessary permissions to directly manipulate the control plane resources - the tenant owner will only have permissions to manipulate the CR itself. It is the job of the cluster-admin to provide the correct permissions to the proper tenant owners.

@rcernich
Copy link
Contributor

rcernich commented Nov 4, 2019

Hey @jmazzitelli, nothing prevents namespace admins from interacting with those objects and currently we don't watch for changes of those resources to overwrite any changes made by somebody other than the operator. Additionally, because we use three way patching during updates, it's likely that any changes made by a user will be persisted through updates, unless they changed something that is explicitly set through the operator. This was done on purpose to give users the ability to workaround any issues or temporarily change things based on operational needs (i.e. escape hatch). That said, they really shouldn't be messing with things, as they can put things into a bad state (e.g. deleting a deployment). There are also practical concerns given the amount of settings that might have to be exposed. In reality, this is no different than working with the charts/templates directly.

Having said all that, the user should be doing everything through the CR, even though they may have the ability to interact with the resources created by the operator. (Sort of like buying a car and deciding to work on it yourself.)

@jmazzitelli
Copy link
Member

Sort of like buying a car and deciding to work on it yourself.

OK, got it.

I tell people "don't mess with the operator-created resources - just edit the CR to modify your operand installation." But good analogy with the car - if you know how to change your own oil - feel free. Just don't complain to me when you seize the engine. :)

@sdake
Copy link
Member

sdake commented Nov 4, 2019

After reading through this (long) review, I am unclear on the next actions for @irisdingbj to make to get this PR merged. Options could be simple like "change X" or "change Y" in this specific code. If the answer is more complex, as this review would seem to indicate, then please suggest filing a bug around the points of contention so they may be addressed in followup PRs.

It is unreasonable to expect @irisdingbj to resolve all of this thread in one PR - the PR would be massive. I strongly prefer forward iteraataive progress over a stalled PR.

Cheers
-steve

@rcernich
Copy link
Contributor

rcernich commented Nov 4, 2019

@sdake, agreed. For now, we should not prune namespaces. Other details can be worked through separate PR/issues.

@irisdingbj, sorry for hijacking this PR, but I think it stimulated an important discussion.

@ostromart
Copy link
Contributor

Yes, agreed. Let's not prune the namespace, and move the other changes out to new PRs.

@sdake
Copy link
Member

sdake commented Nov 5, 2019

I am pretty sure that isn't what I just stated in #484 (comment).

To clarify, the right technical choice is to garbage collect any objects created by the operator. This includes namespaces, if they are created by the operator (which they are), even if that has negative side effects because human operators decide to execute operations outside of the best practices operator workflow (which I correctly assume does not include creating Istio resources not via the operator's ICP in the operator's specific namespace). I'm not sure where the confusion came from, but it doesn't matter. Now that is settled...

Given this corrected understanding, what should be done with this specific PR which garbage collects resources created by the operator after the ICP resource is removed. The only viable answer I see is moving forward with this PR after the few nit-picks are resolved, and for @rcernich and @ostromart to open issues tracking improving this behavior for 1.5.

Cheers
-steve

@linsun
Copy link
Member

linsun commented Nov 5, 2019

Wow, a long thread... just to make sure we are on the same page:

  • option 1: the code does this today.
    user deploys ICP resource in istio-operator ns, and the operator creates the istio-system ns and installs istio in that ns. Of course, user can choose to specify a ns if he/she doesn't want to use the default istio-system namespace.

  • option 2:
    user creates namespace X (e.g. istio-system), and deploy ICP resource in that namespace X. namespace X is also where istio is installed.... we require user to create the namespace ahead of time.

I think this PR is for cleaning up with option 1, while a few folks believe we should do option 2 instead thus no namespace cleanup would be required. I do feel option 2 is particular useful for multitenancy install scenarios.

@jmazzitelli
Copy link
Member

I do feel option 2 is particular useful for multitenancy install scenarios.

Yes, exactly why option 2 is good - multi-tenancy use-cases.

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Feb 12, 2020
@istio-testing
Copy link

@irisdingbj: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. needs-rebase Indicates a PR needs to be rebased before being merged size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.