-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watches on remote cluster expires every 10s #8893
Comments
I think the problem is bigger than just us adding indexes on client creation. The same can happen for every Get/List call in our reconcilers. All of them can trigger the creation of an informer. I think we have to take a closer look how all of this works in CR. I think there could be a way to use a different http client for the synchronous calls during informer creation vs ListWatch. Code is roughly here: https://github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/cache/internal/informers.go#L377-L503 This could e.g. work by passing into the cache:
POC: https://github.com/sbueringer/cluster-api/pull/151/files# But not sure if that covers everything relevant. It should be definitely cleaner to implement this in CR. I think my current POC is just exploting the current behavior which is not guaranteed to stay the same. |
/triage accepted |
/priority important-soon |
/assign @sbueringer |
This issue is labeled with You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
/triage accepted |
This issue is labeled with You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
/triage accepted |
Notes to myself. We can play around with the 10s timeout we add to the rest config and the 10s timeout we add in Get/List calls to check if any changes to them affect the 10s timeouts. |
What steps did you take and what happened?
When we create watches on the remote cluster using the cluster cache tracker, watches are created using the same rest config that the cache uses, and currently this rest config has a timeout of 10s.
This leads to having watches on remote clusters expire every 10s, and even if this doesn't seem an issue thanks to watch bookmarks, this isn't ideal and we should figure out a way to have a longer timeout for the cache than the one used for the remote clients.
What did you expect to happen?
Cluster API to create watches on remote machines with longer timeouts
Cluster API version
main, old versions
Kubernetes version
No response
Anything else you would like to add?
The tricky part of this is that we want to keep the 10-second/short timeout for the remote client, and we also want to use it while creating a new remote client because this timeout is blocking a reconcile loop in a synchronous way.
However, unfortunately, when creating a new remote client we are also creating indexes on the cache, and thus a longer timeout on the cache will block unless we find a way to initiate indexes asyc
Label(s) to be applied
/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.
The text was updated successfully, but these errors were encountered: