Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sinker occasionally stops reconciling some ResourceSyncs #159

Closed
lukebond opened this issue Sep 4, 2024 · 5 comments · Fixed by #165
Closed

sinker occasionally stops reconciling some ResourceSyncs #159

lukebond opened this issue Sep 4, 2024 · 5 comments · Fixed by #165
Assignees

Comments

@lukebond
Copy link
Contributor

lukebond commented Sep 4, 2024

i've seen a few cases where i don't see the expected copied sinker resources turning up, i go check the sinker logs and grep for the name and don't see a log message for reconciling it. at this time i usually see the logs messages for only one or a few resources, suggesting it's not reconciling most things anymore.

i then restart the sinker pod and it goes back to operating normally and the expected copied resources turn up.

e.g. logs before a restart (i'm grepping here for a cluster ID so we're not seeing all logs):

$ kubectl --context dex@cst-prod01-us-east-1 logs -n sinker sinker-5855b9c8ff-jw8hc -f | grep d7cdb6df-9a73-4b3f-8b8b-1945a30c9403
2024-09-04T10:58:05.293818Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map"
2024-09-04T10:58:05.844387Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map" target_ref=GVKN { api_version: "v1", kind: "ConfigMap", name: "authz-tokens" }
2024-09-04T10:58:15.695361Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map"
2024-09-04T10:58:16.253315Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map" target_ref=GVKN { api_version: "v1", kind: "ConfigMap", name: "authz-tokens" }
2024-09-04T10:58:25.893820Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map"
2024-09-04T10:58:26.442377Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map" target_ref=GVKN { api_version: "v1", kind: "ConfigMap", name: "authz-tokens" }
2024-09-04T10:58:35.997219Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map"
2024-09-04T10:58:36.581932Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map" target_ref=GVKN { api_version: "v1", kind: "ConfigMap", name: "authz-tokens" }
2024-09-04T10:58:46.038353Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map"
2024-09-04T10:58:46.584337Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map" target_ref=GVKN { api_version: "v1", kind: "ConfigMap", name: "authz-tokens" }
2024-09-04T10:58:56.083440Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map"
2024-09-04T10:58:56.691433Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map" target_ref=GVKN { api_version: "v1", kind: "ConfigMap", name: "authz-tokens" }
2024-09-04T10:59:06.140468Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map"
2024-09-04T10:59:06.713835Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map" target_ref=GVKN { api_version: "v1", kind: "ConfigMap", name: "authz-tokens" }
2024-09-04T10:59:16.441489Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map"

and then after a restart:

$ kubectl --context dex@cst-prod01-us-east-1 logs -n sinker sinker-7b67fcf586-rzgzc -f | grep d7cdb6df-9a73-4b3f-8b8b-1945a30c9403
2024-09-04T11:17:29.064891Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-node-sa-serviceaccount-sync"
2024-09-04T11:17:29.088956Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-grafana-oauth-client-credentials"
2024-09-04T11:17:29.111114Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-catalog-statefulset"
2024-09-04T11:17:29.111325Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-node-getter-binding-clusterrolebinding-sync"
2024-09-04T11:17:29.111517Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-authz-tokens-config-map"
2024-09-04T11:17:29.112543Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-external-resizer-role-clusterrole-sync"
2024-09-04T11:17:29.113433Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-querier-deployment"
2024-09-04T11:17:29.113535Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-node-daemonset-sync"
2024-09-04T11:17:29.115101Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-attacher-binding-clusterrolebinding-sync"
2024-09-04T11:17:29.116683Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-snapshotter-binding-clusterrolebinding-sync"
2024-09-04T11:17:29.117127Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ocirepository-sinkercontainer"
2024-09-04T11:17:29.117651Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-leases-rolebinding-rolebinding-sync"
2024-09-04T11:17:29.119050Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-external-provisioner-role-clusterrole-sync"
2024-09-04T11:17:29.119155Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-resizer-binding-clusterrolebinding-sync"
2024-09-04T11:17:29.118520Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-central-alertmanager-kube-rbac-proxy-token"
2024-09-04T11:17:29.119232Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-zerossl-eab"
2024-09-04T11:17:29.119385Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-controller-sa-serviceaccount-sync"
2024-09-04T11:17:29.119434Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-initial-token"
2024-09-04T11:17:29.120287Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-gar-docker-secret-sync"
2024-09-04T11:17:29.120726Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-gar-docker-secret-sync"
2024-09-04T11:17:29.123098Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-kustomization-iox-manifests"
2024-09-04T11:17:29.125541Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-leases-role-role-sync"
2024-09-04T11:17:29.128931Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-node-role-clusterrole-sync"
2024-09-04T11:17:29.129266Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-kustomization-sinkercontainer"
2024-09-04T11:17:29.129345Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-external-snapshotter-role-clusterrole-sync"
2024-09-04T11:17:29.129368Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ingester-statefulset"
2024-09-04T11:17:29.129516Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-controller-deployment-sync"
2024-09-04T11:17:29.130068Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-compactor-statefulset"
2024-09-04T11:17:29.131166Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-zerossl-eab-secret-sync"
2024-09-04T11:17:29.131436Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-external-attacher-role-clusterrole-sync"
2024-09-04T11:17:29.131566Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-router-deployment"
2024-09-04T11:17:29.131801Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ocirepository-iox-manifests"
2024-09-04T11:17:29.131890Z  INFO sinker::controller: running reconciler name="thanos-ruler-prometheus-rules-d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5"
2024-09-04T11:17:29.132017Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs.csi.aws.com-csidriver-sync"
2024-09-04T11:17:29.132700Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-aws-secret-secret-sync"
2024-09-04T11:17:29.133923Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-gar-docker-secret"
2024-09-04T11:17:29.133923Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-controller-poddisruptionbudget-sync"
2024-09-04T11:17:29.134588Z  INFO sinker::controller: running reconciler name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-provisioner-binding-clusterrolebinding-sync"
2024-09-04T11:18:09.151874Z  WARN sinker::controller: reconcile failed name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-resizer-binding-clusterrolebinding-sync" error=Kube Error: HyperError: error trying to connect: deadline has elapsed
2024-09-04T11:18:09.153339Z  WARN sinker::controller: reconcile failed name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-central-alertmanager-kube-rbac-proxy-token" error=Kube Error: HyperError: error trying to connect: deadline has elapsed
2024-09-04T11:18:09.173177Z  WARN sinker::controller: reconcile failed name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ingester-statefulset" error=Kube Error: HyperError: error trying to connect: deadline has elapsed
2024-09-04T11:18:09.176223Z  WARN sinker::controller: reconcile failed name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-node-sa-serviceaccount-sync" error=Kube Error: HyperError: error trying to connect: deadline has elapsed
2024-09-04T11:18:09.208395Z  WARN sinker::controller: reconcile failed name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-compactor-statefulset" error=Kube Error: HyperError: error trying to connect: deadline has elapsed
2024-09-04T11:18:09.231329Z  WARN sinker::controller: reconcile failed name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-node-role-clusterrole-sync" error=Kube Error: HyperError: error trying to connect: deadline has elapsed
2024-09-04T11:18:09.261486Z  WARN sinker::controller: reconcile failed name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-external-resizer-role-clusterrole-sync" error=Kube Error: HyperError: error trying to connect: deadline has elapsed
2024-09-04T11:18:09.275825Z  WARN sinker::controller: reconcile failed name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-controller-sa-serviceaccount-sync" error=Kube Error: HyperError: error trying to connect: deadline has elapsed
2024-09-04T11:18:09.280841Z  WARN sinker::controller: reconcile failed name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-grafana-oauth-client-credentials" error=Kube Error: HyperError: error trying to connect: deadline has elapsed
2024-09-04T11:18:10.792700Z  WARN sinker::controller: reconcile failed name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-external-snapshotter-role-clusterrole-sync" error=Kube Error: HyperError: error trying to connect: deadline has elapsed
2024-09-04T11:18:10.795764Z  WARN sinker::controller: reconcile failed name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-node-getter-binding-clusterrolebinding-sync" error=Kube Error: HyperError: error trying to connect: deadline has elapsed
2024-09-04T11:18:11.541928Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-aws-secret-secret-sync" target_ref=GVKN { api_version: "v1", kind: "Secret", name: "aws-secret" }
2024-09-04T11:18:12.137013Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-provisioner-binding-clusterrolebinding-sync" target_ref=GVKN { api_version: "rbac.authorization.k8s.io/v1", kind: "ClusterRoleBinding", name: "ebs-csi-provisioner-binding" }
2024-09-04T11:18:12.684563Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-external-provisioner-role-clusterrole-sync" target_ref=GVKN { api_version: "rbac.authorization.k8s.io/v1", kind: "ClusterRole", name: "ebs-external-provisioner-role" }
2024-09-04T11:18:13.221171Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-node-daemonset-sync" target_ref=GVKN { api_version: "apps/v1", kind: "DaemonSet", name: "ebs-csi-node" }
2024-09-04T11:18:13.243943Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs.csi.aws.com-csidriver-sync" target_ref=GVKN { api_version: "storage.k8s.io/v1", kind: "CSIDriver", name: "ebs.csi.aws.com" }
2024-09-04T11:18:13.398741Z  INFO sinker::controller: successfully reconciled name="thanos-ruler-prometheus-rules-d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5" target_ref=GVKN { api_version: "monitoring.coreos.com/v1", kind: "PrometheusRule", name: "thanos-ruler-prometheus-rules-d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5" }
2024-09-04T11:18:14.535559Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-gar-docker-secret-sync" target_ref=GVKN { api_version: "v1", kind: "Secret", name: "gar-docker-secret" }
2024-09-04T11:18:15.024072Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-zerossl-eab" target_ref=GVKN { api_version: "v1", kind: "Secret", name: "zerossl-eab" }
2024-09-04T11:18:16.606474Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-zerossl-eab-secret-sync" target_ref=GVKN { api_version: "v1", kind: "Secret", name: "zerossl-eab" }
2024-09-04T11:18:17.186399Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-controller-poddisruptionbudget-sync" target_ref=GVKN { api_version: "policy/v1", kind: "PodDisruptionBudget", name: "ebs-csi-controller" }
2024-09-04T11:18:18.537320Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-attacher-binding-clusterrolebinding-sync" target_ref=GVKN { api_version: "rbac.authorization.k8s.io/v1", kind: "ClusterRoleBinding", name: "ebs-csi-attacher-binding" }
2024-09-04T11:18:18.788128Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ocirepository-iox-manifests" target_ref=GVKN { api_version: "source.toolkit.fluxcd.io/v1beta2", kind: "OCIRepository", name: "iox-manifests" }
2024-09-04T11:18:19.196806Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-external-attacher-role-clusterrole-sync" target_ref=GVKN { api_version: "rbac.authorization.k8s.io/v1", kind: "ClusterRole", name: "ebs-external-attacher-role" }
2024-09-04T11:18:20.055754Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-initial-token" target_ref=GVKN { api_version: "v1", kind: "Secret", name: "d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-secret-initial-token" }
2024-09-04T11:18:20.704758Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-snapshotter-binding-clusterrolebinding-sync" target_ref=GVKN { api_version: "rbac.authorization.k8s.io/v1", kind: "ClusterRoleBinding", name: "ebs-csi-snapshotter-binding" }
2024-09-04T11:18:21.738944Z  INFO sinker::controller: successfully reconciled name="d7cdb6df-9a73-4b3f-8b8b-1945a30c9403-mhxk5-ebs-csi-leases-rolebinding-rolebinding-sync" target_ref=GVKN { api_version: "rbac.authorization.k8s.io/v1", kind: "RoleBinding", name: "ebs-csi-leases-rolebinding" }

i wonder if some failures that tie up the reconcilers fills up a pool of tokio threads or something? and the others just stop? 🤷‍♀️

@zach-robinson-dev
Copy link
Contributor

When you say they don't turn up, are there some resources that don't get created at all until you restart or is it just that some resource are getting out of sync?

@zach-robinson-dev zach-robinson-dev self-assigned this Sep 4, 2024
@lukebond
Copy link
Contributor Author

lukebond commented Sep 4, 2024

i have observed both cases

@zach-robinson-dev
Copy link
Contributor

How far out-of-sync have you seen resources getting? I've been doing some reading on how Tokio works and I think we would need to export the metrics to Prometheus in order to fully figure out exactly what is happening, but it's possible we may be able to fix it (or at least alleviate it) by increasing the number of worker threads spawned by the Tokio runtime.

@lukebond
Copy link
Contributor Author

it's not that i see it falling behind a bit, it's that i see it doing nothing for hours or days.

perhaps we could start by emitting metrics about the number of reconciliations it is performing, and alert on a drop on it? then at least we know when we need to intervene before it causes an incident.

@zach-robinson-dev
Copy link
Contributor

After doing some research, I think we may be experiencing this bug, which was fixed in a release of the kube-rs library after the one that we are using currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants