-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected, irregular reprocessing of resources by tekton-pipelines-controller #3676
Comments
Issues go stale after 90d of inactivity. /lifecycle stale Send feedback to tektoncd/plumbing. |
Sorry for not responding earler. This should be considered expected behavior. Reconcilers are configured to periodically scan all resources in case they missed a previous update. Normally this has no effect, except a (hopefully) brief spike in the workqueue depth, and not a noticeable additional latency for in-progress PipelineRuns. |
Thank you @imjasonh for picking this up! Do you know if the periodic reprocessing is deterministic, e.g. every x hours (after the container started?)? Is this basically https://github.com/knative/pkg/blob/main/controller/controller.go#L52 or did I miss where this is overwritten by tekton? Some background info: we currently have ~15k taskruns and 8k pipelineruns in the cluster. With these numbers, the reprocessing takes ~10 min. This is not ideal but ok. To lower the impact, we also apply sharding into 3 buckets. (really looking forward to the results project in this context ;) ) |
I believe the resync period is from the time the informer started (which is the time the container started) and not for example since the last time the object was reconciled. That 10-hour resync period is the one Tekton uses, we don't overwrite it AFAIK. Results is probably going to be your best bet long-term, if you want to try that out and give feedback that'd probably be helpful to that project. Until then sharding is a reasonable band-aid, I'm sorry I don't have better solutions than that at the moment. |
That's perfectly fine :). I´m just not sure if I´m seeing the (expected) periodic reprocessing or something in addition. My guess: lost connection to the API server. I´ll try to monitor this on my end and potentially re-open the issue. |
We learned to live with this, but the 10h window is still quite annoying as it is impossible for us to have a timeframe (we use scheduled controller restarts to achieve that as workaround) during a business day where no developer is interrupted (some start their day early, some late). Imo it would be good to either
I see it as defect and the user experience is quite bad, nobody likes to see a stuck build. I suggest this gets reopened and improved. |
1500 Pipelines here and we are heavily impacted.
|
This should allow advanced user/cluster-admin to configure the resyncPeriod to a value that fit their cluster instead of relying on the default 10h one. This is related to tektoncd#3676. Signed-off-by: Vincent Demeester <[email protected]>
This should allow advanced user/cluster-admin to configure the resyncPeriod to a value that fit their cluster instead of relying on the default 10h one. This is related to tektoncd#3676. Signed-off-by: Vincent Demeester <[email protected]>
This should allow advanced user/cluster-admin to configure the resyncPeriod to a value that fit their cluster instead of relying on the default 10h one. This is related to tektoncd#3676. Signed-off-by: Vincent Demeester <[email protected]>
This should allow advanced user/cluster-admin to configure the resyncPeriod to a value that fit their cluster instead of relying on the default 10h one. This is related to tektoncd#3676. Signed-off-by: Vincent Demeester <[email protected]>
This should allow advanced user/cluster-admin to configure the resyncPeriod to a value that fit their cluster instead of relying on the default 10h one. This is related to tektoncd#3676. Signed-off-by: Vincent Demeester <[email protected]>
This should allow advanced user/cluster-admin to configure the resyncPeriod to a value that fit their cluster instead of relying on the default 10h one. This is related to #3676. Signed-off-by: Vincent Demeester <[email protected]>
Hi,
at our tekton deployment, I noticed that sometimes the work queue (tekton_workqueue_depth) of the tekton pipelines controller spikes. Based on the "size" of the spike, this looks like a full reprocessing of e.g. the PipelineRuns in the cluster. In the log, I can see the following messages when this happens. This happens irregularly.
I suspect that the connection between the controller and the API server fails however this is just a guess. Is there any way to find out what causes the reprocessing?
Thanks,
Fabian
Additional Info
v1.18.12
v0.19.0
The text was updated successfully, but these errors were encountered: