-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watches are not working for some controller after informers_map.go:204: watch of *v1alpha1.SFService ended with: too old resource version: 79387199 (79398464)
#869
Comments
@vivekzhere the log message is generally nothing to worry about and expected to occasionally appear: kubernetes/kubernetes#22024 (comment)
That sounds like a bug. Does your reconciler still get requests and blocks when processing them or does it not even get them?
I don't follow, are you referring to leader election with the Could you list the sequence in which the instances get started/stopped and get the leader lease? |
No. The reconciler for this one controller is not getting any requests at all.
Yes. I am referring to leader election with the
There are two instances of the operator running( PS: Now we have identified that issue happens only during upgrades which requires the recreation of these vms. In such cases the issue is happening every time. But we are not seeing the issue during upgrades which just replaces the binaries of operator and does not do a vm recreate. |
Okay, so are you sure this is an issue with controller-runtime then? |
@alvaroaleman We are not really sure this is a controller runtime issue anymore. We were doing Elaborating a little more on our setup. We have two api servers also deployed along with the operator. On the on VM on which |
/triage support |
We have four controllers for four different CRs in our operator. Once in a while we see these logs like
these comes for three of the CRs. For one CR this log does not come. After that the controller for this one resource is not processing any request. Only a restart of the operator fixes this.
The CR for which watch is failing as huge number of resources (~30K). While the other reources are less than 10K in number.
Also we a have two instance of the operator running with leader election. We see a pattern that this issue happens when the slave becomes master and after few minutes go down (for update) and the other one becomes master again. In the first switch over the log for all four resources are seen. But in the second switch over only three resources are watched
We are using controller-runtime 0.4.0.
(kubernetes/kubernetes#22024)
The text was updated successfully, but these errors were encountered: