-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global and local operators ignore locks and contend resources #2508
Comments
Hi @nicolaferraro, Really thanks for your input. Just to try to make easier we address it, following the summary of the scenario. WHAT : Concurrence errors are faced to dealing with resources So, I am afraid that it is not supported currently. However, I am flagging this one as |
@joelanford, @dmesser I think is good you beware of this one. |
@nicolaferraro You're likely running into this issue because the global operator and the local operator are in different namespaces and therefore create separate locks in their respective namespaces. The intention of the locking mechanism is that the operator instances that are operating in the same scope (i.e. watching the same namespaces), should share the same lock. So you could run a deployment for your operator that has 3 replicas, and only 1 would run at a time. It is not recommended to have different operator instances running that have overlapping watches because of the exact issue you're running into. Can you explain what your use case is for running a global and local operator that overlap on their watched namespaces? If you have a global operator already deployed that watches all namespaces, what is the purpose of running an additional local operator that watches just one of those namespaces? |
It's not a use case, but some people recently experimented "deployments rolling like crazy" and they discovered that the cause was a double operator. In particular, in one case the user installed "knative-camel-sources" from the operator hub, not knowing that that package depends on "camel-k", so it also install "camel-k" globally under the hood (and you're not aware of this fact if you don't carefully review the install plan). Later, the same user wanted to try "camel-k" but didn't install it via operator hub, instead she instantiated the operator on the namespace directly (that install mode is available as alternative to operator hub, mostly because operator hub is not common in vanilla Kube or Openshift 3.x). So the new operator was created in the local namespace because the user didn't know there was already one. Since this is technically possible, I'd like to avoid conflicts and "crazy deployments" and gracefully signal the problem via warning events. I was thinking to extend the lock API to add a method that can check if there's a local operator that already "owns" a namespace, to be used as preliminary check by the global operator only. That should prevent at least the situation described above. If that is ok, I can provide a PR. |
Before moving forward on discussion of implementation options, I'd like to get some feedback on this issue from the OLM maintainers. It does seems like there could be a UX improvement that could be made here, just not sure exactly what the best approach would be.
This might work if the global operator starts after the local operator. But if the global operator starts first and gets a lock (which would be in its namespace), it is unlikely that the local operator would know about that since it isn't aware of anything outside of its namespace, and furthermore may not even have permissions to access objects in other namespaces to even try to make that check. |
Yeah from the perspective of OLM it's something we are actively looking to improve from a UX perspective. Right now (especially from the UI) there isn't a good indicator as to what dependencies are installed when you create a subscription. It also seems like there's some good feedback here around how to provide safeguards around installing things in multiple places (especially with watches like this). |
To be clear, the approach I was proposing is:
This guarantees that the two operators eventually don't overlap and local operator always take precedence, no matter when it starts. I understand that the solution does not cover cases like two operators running externally to Given current approach to locking which is based on OwnerReferences, a resource cannot be owned by another resource in a different namespace (there's no |
The scenario described here is what OLM was designed to avoid - intersecting sets of watch namespace of two Operators. When installing it manually in a namespace the user bypassed that safeguard. I don't think we should re-implement this safeguard at the SDK level but rather explain that this is exactly what OLM is for. |
I agree that OLM prevents this problem from happening. But is OLM a prerequisite for using operator-sdk? |
OLM is not a pre-requisite to use the SDK. But lifecycling the Operator is not in the scope of the SDK. |
Closing as not a bug. Happy to continue the discussion to make sure we incorporate this (and any other feedback). |
Bug Report
What did you do?
Installed an operator with
WATCH_NAMESPACE=
(nothing) inglobal
namespace and a local operator (WATCH_NAMESPACE=default
) indefault
namespace.Both operators try to interact with my custom resource. There's a lock configmap in the namespace, but the global operator just ignores it.
What did you expect to see?
The local operator takes precedence and wins, the global one ignores the namespace.
What did you see instead? Under which circumstances?
They try to interact in parallel on the same resource and generate lots of conflicts and misbehavior in the operand.
Environment
21a93ca
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.6", GitCommit:"abdda3f9fefa29172298a2e42f5102e777a8ec25", GitTreeState:"clean", BuildDate:"2019-05-08T13:53:53Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-03-25T15:45:25Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
OpenShift 4.2
Go
Possible Solution
The
leader
package may expose some API's to check the existence of a lock. The same package may expose an API to let a local operator steal a (global owned) lock gracefully.Additional context
Looking for a way to solve this problem or overcome it.
The text was updated successfully, but these errors were encountered: