Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support For Cluster Addons #400

Closed
mhenriks opened this issue Aug 13, 2018 · 16 comments
Closed

Support For Cluster Addons #400

mhenriks opened this issue Aug 13, 2018 · 16 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@mhenriks
Copy link

KubeVirt [1] is a cluster addon for running virtual machines in Kubernetes. I am looking into creating an Operator and integrating with OLM in order to better manage the lifecycle of KubeVirt installations. But since Kubevirt is a cluster addon, it has some unique requirements/restrictions.

The main thing is that I only want a single instance of the KubeVirt application to exist in the entire cluster. It can be installed into any namespace. But only once. So, if I create an instance of my KubeVirtApplication CRD in namespace "kubevirt-system" I shouldn't be able to create another instance in that namespace. Or in any other namespace. I'm sure that there are ways to implement this restriction in code. But I think some SDK support would be nice, assuming this is in line with the philosophy of the Operator Framework.

What do you think? Is this within the scope of the Operator Framework? Should there be additional support in the SDK/OLM for this case?

[1] https://github.com/kubevirt/kubevirt

@hasbro17
Copy link
Contributor

@mhenriks It's hard to say if the SDK could support something like this.
I'm not entirely sure how you would enforce No more than 1 CRD instance(CR) between all namespaces.

First off I'm assuming the operator would have to be a cluster-wide operator that watches all namespaces.
After the first CR is created the operator would have to ignore all other CRs of the same type(across all namespaces).

I don't know if it's possible to restrict the creation of any new CRs. Maybe through an Admission Webhook but I think that would be beyond the scope of what an operator is expected to do.

The operator could possibly ignore all CRs after the first one. But I'm not clear on the implementation details. Sounds a bit like leader election between CRs of the same type.

If you could elaborate any ideas that you might have on how this restriction could be implemented in an operator that might give us more context to see if the SDK would be the right place for this.

/cc @ecordell I don't know if this is something that could be enforced via OLM. Or if this model of a cluster-wide operator and singleton CR even works with OLM.

@fabiand
Copy link

fabiand commented Aug 14, 2018

First off I'm assuming the operator would have to be a cluster-wide operator that watches all namespaces.

Maybe it helps to say that I could imagine that the specific namespace the add-on should run in is fixed, and not variable.
Just like it is with some add-ons in i.e. OpenShift Origin (IIRC).

@fabiand
Copy link

fabiand commented Aug 14, 2018

xref #236

@ecordell
Copy link
Member

@mhenriks I think the best approach here would be to use ResourceQuotas: (see kubernetes/kubernetes#64201 for enabling ObjectCount quotas for CRs).

If you need to restrict the number before that feature is ready, I would just check the cluster state in the operator and write out a status ("status: failed, reason: there's already one in the namespace")

Should there be additional support in the SDK/OLM for this case?

I'd like to chat more offline about the requirements for kubevirt, there might be some things we want to change to better support kubevirt's deployment with OLM

@fabiand
Copy link

fabiand commented Aug 14, 2018

+1 and let me highlight that kubevirt is just one example of an add-on. hopefully there is no need to highlight this ;)

@mhenriks
Copy link
Author

@hasbro17 yeah, as far as implementation, I was originally thinking that a validating webhook would be one way to go. But ResourceQuotas may be better.

@ecordell I'll be in touch!

@dmage
Copy link

dmage commented Aug 23, 2018

The OpenShift integrated registry is a singleton too. If all CRs share the same storage, they'll be just a kind of replicas and everything will work fine. Though, in normal scenarios there should be exactly one CR in the cluster and perhaps that should be enforced.

/cc @bparees @legionus

@fabiand
Copy link

fabiand commented Aug 23, 2018

I think some work that needs to happen before the operator is adjusted is to work out how addons should be layed out in general.

In the past we saw that infrastructur ecomponents land in kube-system but we now also see that there are dedicated namespaces.

The insight might be that the creation of certain CRs is limited to a namespace. OR that the corresponding CRDs are namespaced.

@bparees
Copy link
Contributor

bparees commented Aug 23, 2018

@dmage I imagine we'd also want to enforce that there is exactly one instance of the registry operator (running in the openshift-image-registry namespace). Presumably that is also true for other cluster singleton operators.

For 4.0 I suspect we're just going to live with "don't install the operator in another namespace and don't create extra registry CR instances" though.

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 9, 2019
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 8, 2019
@fabiand
Copy link

fabiand commented Jun 13, 2019 via email

@openshift-ci-robot openshift-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 13, 2019
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 11, 2019
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 11, 2019
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

8 participants