-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/k8sattributes] Handle all resource deletion event types #27847
Merged
dmitryax
merged 1 commit into
open-telemetry:main
from
c-kruse:k8sattributes-deletefinalstateunknown
Oct 19, 2023
Merged
[processor/k8sattributes] Handle all resource deletion event types #27847
dmitryax
merged 1 commit into
open-telemetry:main
from
c-kruse:k8sattributes-deletefinalstateunknown
Oct 19, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The k8s go client's cache expects OnDelete handlers to handle objects of type DeletedFinalStateUnknown when the cache's watch mechanism misses a delete and notices later. This changes the processor to handle such deletes as if they were normal, rather than logging an error and dropping the change. Signed-off-by: Christian Kruse <[email protected]>
c-kruse
requested review from
dmitryax,
fatsheep9146 and
TylerHelmuth
as code owners
October 19, 2023 02:52
dmitryax
approved these changes
Oct 19, 2023
Thanks @c-kruse! |
martin-majlis-s1
pushed a commit
to scalyr/opentelemetry-collector-contrib
that referenced
this pull request
Oct 20, 2023
…pen-telemetry#27847) **Description:** The k8s go client's cache expects OnDelete handlers to handle objects of type DeletedFinalStateUnknown when the cache's watch mechanism misses a delete and notices later. This changes the processor to handle such deletes as if they were normal, rather than logging an error and dropping the change. **Link to tracking Issue:** open-telemetry#27632 **Testing:** Only what you see in the unit tests. I am open to suggestions, but I don't see this being a code path we can reasonably cover in the e2e test suite. Verified manually locally on a kind cluster. * Stood up two deployments loosely based off e2e testing resources, one w/ a collector built from this branch and the other docker.io/otel/opentelemetry-collector-contrib:latest. * Both included an additional container in the collector pod I used to fiddle with iptables rules. * Added rules to reject traffic to/from the kube api server * Deleted some namespaces containing deployments generating telemetry. * Restored connectivity by removing the iptables rules. * Observed the collector built from this branch was silent (aside from the junk the k8s client logs due to the broken connection) * Observed the latest ([0.87.0](https://hub.docker.com/layers/otel/opentelemetry-collector-contrib/0.87.0/images/sha256-77cdd395b828b09cb920c671966f09a87a40611aa6107443146086f2046f4a9a?context=explore)) collector logged a handful of errors for the deleted resources (api_v1.Pod, and apps_v1.ReplicaSet. I probably just didn't wait long enough for Namespace.) ``` 2023-10-19T02:18:37.781Z error kube/client.go:236 object received was not of type api_v1.Pod {"kind": "processor", "name": "k8sattributes", "pipeline": "metrics", "received": {"Key":"src1/telemetrygen-patched-766d55cbcb-8zktr","Obj":{"metadata":{"name":"telemetrygen-patched-766d55cbcb-8zktr","namespace":"src1","uid":"be5d2268-c8b0-434d-b3b8-8b18083c7a8b","creat ionTimestamp":"2023-10-19T02:01:08Z","labels":{"app":"telemetrygen-patched","pod-template-hash":"766d55cbcb"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"telemetrygen-patched-766d55cbcb","uid":"a887d67a-d5d6-4269-b520-45dbb4f1cd82","controller":true,"blockOwnerDeletion":true}]},"spec":{"containers":[{"name":"telemetrygen","image":"localhost/telemetrygen :latest","resources":{}}],"nodeName":"manual-e2e-testing-control-plane"},"status":{"podIP":"10.244.0.56","startTime":"2023-10-19T02:01:08Z","containerStatuses":[{"name":"telemetrygen","state":{},"lastState":{},"ready":false,"restartCount":0,"image":"","imageID":"","containerID":"containerd://2821ef32cd8bf93a13414504c0f8f0c016c84be49d6ffdbd475d7e4681e90c51"}]}}}} github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sattributesprocessor/internal/kube.(*WatchClient).handlePodDelete github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/internal/kube/client.go:236 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete k8s.io/[email protected]/tools/cache/controller.go:253 ... 2023-10-19T02:19:03.970Z error kube/client.go:868 object received was not of type apps_v1.ReplicaSet {"kind": "processor", "name": "k8sattributes", "pipeline": "metrics", "received": {"Key":"src1/telemetrygen-stable-5c444bb8b8","Obj":{"metadata":{"name":"telemetrygen-stable-5c444bb8b8","namespace":"src1","uid":"d37707ff-b308-4339-8543-a1caf5705ea8","creationTimestamp":null,"ownerReferences":[{"apiVersion":"apps/v1","kind":"Deployment","name":"telemetrygen-stable","uid":"c421276e-e1bf-40c5-85e1-e92e30363da5","controller":true,"blockOwnerDeletion":true}]},"spec":{"selector":null,"template":{"metadata":{"creationTimestamp":null},"spec":{"containers":null}}},"status":{"replicas":0}}}} github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sattributesprocessor/internal/kube.(*WatchClient).handleReplicaSetDelete github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/internal/kube/client.go:868 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete k8s.io/[email protected]/tools/cache/controller.go:253 k8s.io/client-go/tools/cache.(*processorListener).run.func1 k8s.io/[email protected]/tools/cache/shared_informer.go:979 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 ... ``` **Documentation:** N/A - it is not clear to me whether or not this should land on the changelog. Its impact on users is marginal. Signed-off-by: Christian Kruse <[email protected]>
sigilioso
pushed a commit
to carlossscastro/opentelemetry-collector-contrib
that referenced
this pull request
Oct 27, 2023
…pen-telemetry#27847) **Description:** The k8s go client's cache expects OnDelete handlers to handle objects of type DeletedFinalStateUnknown when the cache's watch mechanism misses a delete and notices later. This changes the processor to handle such deletes as if they were normal, rather than logging an error and dropping the change. **Link to tracking Issue:** open-telemetry#27632 **Testing:** Only what you see in the unit tests. I am open to suggestions, but I don't see this being a code path we can reasonably cover in the e2e test suite. Verified manually locally on a kind cluster. * Stood up two deployments loosely based off e2e testing resources, one w/ a collector built from this branch and the other docker.io/otel/opentelemetry-collector-contrib:latest. * Both included an additional container in the collector pod I used to fiddle with iptables rules. * Added rules to reject traffic to/from the kube api server * Deleted some namespaces containing deployments generating telemetry. * Restored connectivity by removing the iptables rules. * Observed the collector built from this branch was silent (aside from the junk the k8s client logs due to the broken connection) * Observed the latest ([0.87.0](https://hub.docker.com/layers/otel/opentelemetry-collector-contrib/0.87.0/images/sha256-77cdd395b828b09cb920c671966f09a87a40611aa6107443146086f2046f4a9a?context=explore)) collector logged a handful of errors for the deleted resources (api_v1.Pod, and apps_v1.ReplicaSet. I probably just didn't wait long enough for Namespace.) ``` 2023-10-19T02:18:37.781Z error kube/client.go:236 object received was not of type api_v1.Pod {"kind": "processor", "name": "k8sattributes", "pipeline": "metrics", "received": {"Key":"src1/telemetrygen-patched-766d55cbcb-8zktr","Obj":{"metadata":{"name":"telemetrygen-patched-766d55cbcb-8zktr","namespace":"src1","uid":"be5d2268-c8b0-434d-b3b8-8b18083c7a8b","creat ionTimestamp":"2023-10-19T02:01:08Z","labels":{"app":"telemetrygen-patched","pod-template-hash":"766d55cbcb"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"telemetrygen-patched-766d55cbcb","uid":"a887d67a-d5d6-4269-b520-45dbb4f1cd82","controller":true,"blockOwnerDeletion":true}]},"spec":{"containers":[{"name":"telemetrygen","image":"localhost/telemetrygen :latest","resources":{}}],"nodeName":"manual-e2e-testing-control-plane"},"status":{"podIP":"10.244.0.56","startTime":"2023-10-19T02:01:08Z","containerStatuses":[{"name":"telemetrygen","state":{},"lastState":{},"ready":false,"restartCount":0,"image":"","imageID":"","containerID":"containerd://2821ef32cd8bf93a13414504c0f8f0c016c84be49d6ffdbd475d7e4681e90c51"}]}}}} github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sattributesprocessor/internal/kube.(*WatchClient).handlePodDelete github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/internal/kube/client.go:236 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete k8s.io/[email protected]/tools/cache/controller.go:253 ... 2023-10-19T02:19:03.970Z error kube/client.go:868 object received was not of type apps_v1.ReplicaSet {"kind": "processor", "name": "k8sattributes", "pipeline": "metrics", "received": {"Key":"src1/telemetrygen-stable-5c444bb8b8","Obj":{"metadata":{"name":"telemetrygen-stable-5c444bb8b8","namespace":"src1","uid":"d37707ff-b308-4339-8543-a1caf5705ea8","creationTimestamp":null,"ownerReferences":[{"apiVersion":"apps/v1","kind":"Deployment","name":"telemetrygen-stable","uid":"c421276e-e1bf-40c5-85e1-e92e30363da5","controller":true,"blockOwnerDeletion":true}]},"spec":{"selector":null,"template":{"metadata":{"creationTimestamp":null},"spec":{"containers":null}}},"status":{"replicas":0}}}} github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sattributesprocessor/internal/kube.(*WatchClient).handleReplicaSetDelete github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/internal/kube/client.go:868 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete k8s.io/[email protected]/tools/cache/controller.go:253 k8s.io/client-go/tools/cache.(*processorListener).run.func1 k8s.io/[email protected]/tools/cache/shared_informer.go:979 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 ... ``` **Documentation:** N/A - it is not clear to me whether or not this should land on the changelog. Its impact on users is marginal. Signed-off-by: Christian Kruse <[email protected]>
jmsnll
pushed a commit
to jmsnll/opentelemetry-collector-contrib
that referenced
this pull request
Nov 12, 2023
…pen-telemetry#27847) **Description:** The k8s go client's cache expects OnDelete handlers to handle objects of type DeletedFinalStateUnknown when the cache's watch mechanism misses a delete and notices later. This changes the processor to handle such deletes as if they were normal, rather than logging an error and dropping the change. **Link to tracking Issue:** open-telemetry#27632 **Testing:** Only what you see in the unit tests. I am open to suggestions, but I don't see this being a code path we can reasonably cover in the e2e test suite. Verified manually locally on a kind cluster. * Stood up two deployments loosely based off e2e testing resources, one w/ a collector built from this branch and the other docker.io/otel/opentelemetry-collector-contrib:latest. * Both included an additional container in the collector pod I used to fiddle with iptables rules. * Added rules to reject traffic to/from the kube api server * Deleted some namespaces containing deployments generating telemetry. * Restored connectivity by removing the iptables rules. * Observed the collector built from this branch was silent (aside from the junk the k8s client logs due to the broken connection) * Observed the latest ([0.87.0](https://hub.docker.com/layers/otel/opentelemetry-collector-contrib/0.87.0/images/sha256-77cdd395b828b09cb920c671966f09a87a40611aa6107443146086f2046f4a9a?context=explore)) collector logged a handful of errors for the deleted resources (api_v1.Pod, and apps_v1.ReplicaSet. I probably just didn't wait long enough for Namespace.) ``` 2023-10-19T02:18:37.781Z error kube/client.go:236 object received was not of type api_v1.Pod {"kind": "processor", "name": "k8sattributes", "pipeline": "metrics", "received": {"Key":"src1/telemetrygen-patched-766d55cbcb-8zktr","Obj":{"metadata":{"name":"telemetrygen-patched-766d55cbcb-8zktr","namespace":"src1","uid":"be5d2268-c8b0-434d-b3b8-8b18083c7a8b","creat ionTimestamp":"2023-10-19T02:01:08Z","labels":{"app":"telemetrygen-patched","pod-template-hash":"766d55cbcb"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"telemetrygen-patched-766d55cbcb","uid":"a887d67a-d5d6-4269-b520-45dbb4f1cd82","controller":true,"blockOwnerDeletion":true}]},"spec":{"containers":[{"name":"telemetrygen","image":"localhost/telemetrygen :latest","resources":{}}],"nodeName":"manual-e2e-testing-control-plane"},"status":{"podIP":"10.244.0.56","startTime":"2023-10-19T02:01:08Z","containerStatuses":[{"name":"telemetrygen","state":{},"lastState":{},"ready":false,"restartCount":0,"image":"","imageID":"","containerID":"containerd://2821ef32cd8bf93a13414504c0f8f0c016c84be49d6ffdbd475d7e4681e90c51"}]}}}} github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sattributesprocessor/internal/kube.(*WatchClient).handlePodDelete github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/internal/kube/client.go:236 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete k8s.io/[email protected]/tools/cache/controller.go:253 ... 2023-10-19T02:19:03.970Z error kube/client.go:868 object received was not of type apps_v1.ReplicaSet {"kind": "processor", "name": "k8sattributes", "pipeline": "metrics", "received": {"Key":"src1/telemetrygen-stable-5c444bb8b8","Obj":{"metadata":{"name":"telemetrygen-stable-5c444bb8b8","namespace":"src1","uid":"d37707ff-b308-4339-8543-a1caf5705ea8","creationTimestamp":null,"ownerReferences":[{"apiVersion":"apps/v1","kind":"Deployment","name":"telemetrygen-stable","uid":"c421276e-e1bf-40c5-85e1-e92e30363da5","controller":true,"blockOwnerDeletion":true}]},"spec":{"selector":null,"template":{"metadata":{"creationTimestamp":null},"spec":{"containers":null}}},"status":{"replicas":0}}}} github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sattributesprocessor/internal/kube.(*WatchClient).handleReplicaSetDelete github.com/open-telemetry/opentelemetry-collector-contrib/processor/[email protected]/internal/kube/client.go:868 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete k8s.io/[email protected]/tools/cache/controller.go:253 k8s.io/client-go/tools/cache.(*processorListener).run.func1 k8s.io/[email protected]/tools/cache/shared_informer.go:979 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 ... ``` **Documentation:** N/A - it is not clear to me whether or not this should land on the changelog. Its impact on users is marginal. Signed-off-by: Christian Kruse <[email protected]>
abhinavdahiya
added a commit
to abhinavdahiya/eks-node-viewer
that referenced
this pull request
May 8, 2024
OnDelete event handler can receive https://pkg.go.dev/k8s.io/client-go/tools/cache#DeletedFinalStateUnknown in cases where delete event is missed during reconnection. for example, observed this ``` created by main.startMonitor in goroutine 1 /home/runner/work/eks-node-viewer/eks-node-viewer/cmd/eks-node-viewer/main.go:207 +0x728 panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Node [recovered] panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Node [recovered] panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Node ``` Followed similar solution to open-telemetry/opentelemetry-collector-contrib#27847
tzneal
pushed a commit
to awslabs/eks-node-viewer
that referenced
this pull request
May 14, 2024
OnDelete event handler can receive https://pkg.go.dev/k8s.io/client-go/tools/cache#DeletedFinalStateUnknown in cases where delete event is missed during reconnection. for example, observed this ``` created by main.startMonitor in goroutine 1 /home/runner/work/eks-node-viewer/eks-node-viewer/cmd/eks-node-viewer/main.go:207 +0x728 panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Node [recovered] panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Node [recovered] panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Node ``` Followed similar solution to open-telemetry/opentelemetry-collector-contrib#27847
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
processor/k8sattributes
k8s Attributes processor
Skip Changelog
PRs that do not require a CHANGELOG.md entry
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
The k8s go client's cache expects OnDelete handlers to handle objects of type DeletedFinalStateUnknown when the cache's watch mechanism misses a delete and notices later. This changes the processor to handle such deletes as if they were normal, rather than logging an error and dropping the change.
Link to tracking Issue: #27632
Testing:
Only what you see in the unit tests. I am open to suggestions, but I don't see this being a code path we can reasonably cover in the e2e test suite.
Verified manually locally on a kind cluster.
Documentation: N/A - it is not clear to me whether or not this should land on the changelog. Its impact on users is marginal.