-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sinker occasionally stops reconciling some ResourceSyncs #159
Comments
When you say they don't turn up, are there some resources that don't get created at all until you restart or is it just that some resource are getting out of sync? |
i have observed both cases |
How far out-of-sync have you seen resources getting? I've been doing some reading on how Tokio works and I think we would need to export the metrics to Prometheus in order to fully figure out exactly what is happening, but it's possible we may be able to fix it (or at least alleviate it) by increasing the number of worker threads spawned by the Tokio runtime. |
it's not that i see it falling behind a bit, it's that i see it doing nothing for hours or days. perhaps we could start by emitting metrics about the number of reconciliations it is performing, and alert on a drop on it? then at least we know when we need to intervene before it causes an incident. |
After doing some research, I think we may be experiencing this bug, which was fixed in a release of the |
i've seen a few cases where i don't see the expected copied sinker resources turning up, i go check the sinker logs and grep for the name and don't see a log message for reconciling it. at this time i usually see the logs messages for only one or a few resources, suggesting it's not reconciling most things anymore.
i then restart the sinker pod and it goes back to operating normally and the expected copied resources turn up.
e.g. logs before a restart (i'm grepping here for a cluster ID so we're not seeing all logs):
and then after a restart:
i wonder if some failures that tie up the reconcilers fills up a pool of tokio threads or something? and the others just stop? 🤷♀️
The text was updated successfully, but these errors were encountered: