Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore the unknown CRD's #1080

Closed
NishantSingh10 opened this issue Jun 1, 2023 · 5 comments
Closed

Ignore the unknown CRD's #1080

NishantSingh10 opened this issue Jun 1, 2023 · 5 comments
Labels
enhancement New feature or request needs-triage Relates to issues that should be refined

Comments

@NishantSingh10
Copy link

Overview

Add ignore unknown CRD's feature.

Acceptance Criteria

there should be some configuration which will ignore the unknown CRD's.

Reason

When I am configuring botkube it's working perfectly but if it found an unknown CRD's whole botkube failing.

@NishantSingh10 NishantSingh10 added the enhancement New feature or request label Jun 1, 2023
@pkosiec
Copy link
Member

pkosiec commented Jun 1, 2023

Hey @NishantSingh10 , can you elaborate a bit more?

When I am configuring botkube it's working perfectly but if it found an unknown CRD's whole botkube failing.

Which exactly issues you're facing? Can you provide the logs?

@pkosiec pkosiec added needs-triage Relates to issues that should be refined needs-more-info Further information is needed and removed enhancement New feature or request needs-triage Relates to issues that should be refined labels Jun 1, 2023
@NishantSingh10
Copy link
Author

Hi @pkosiec, I discuss the whole problem in detail with one of your member named Mateusz Szostok, and provided every details to them, he helped me out to figure out this issue when i shared the log with him, can you please checkout once with him, it will be really helpful to you.

@pkosiec
Copy link
Member

pkosiec commented Jun 1, 2023

Ah, ok - @mszostok please put a bit more description here once you have time. Thanks!

@mszostok mszostok added enhancement New feature or request needs-triage Relates to issues that should be refined and removed needs-more-info Further information is needed labels Jun 1, 2023
@mszostok
Copy link
Contributor

mszostok commented Jun 1, 2023

Description

The k8s source is now an external plugin. We start only 1 process, and then we call Stream method multiple times. The problem is that if one of them will fail, the whole plugin is killed, however you will not know that.

User facing problem

  1. Configure dedicated K8s sources
  2. Botkube starts and sends a message My watch begins for cluster 'Stage'!
  3. You don't receive any k8s alerts
  4. You go to check botkube logs:
    time="2023-05-31T06:29:19Z" level=info msg="Botkube is connecting to Slack..." bot=SocketSlack commGroup=default-group
    time="2023-05-31T06:29:19Z" level=info msg="Botkube connected to Slack!" bot=SocketSlack commGroup=default-group
    time="2023-05-31T06:29:19Z" level=info msg="Starting a new stream for \"k8s-all-events_interactive/true\"."
    time="2023-05-31T06:29:19Z" level=info msg="Starting checker" component="Upgrade Checker"
    time="2023-05-31T06:29:19Z" level=info msg="Start source streaming..." pluginName=botkube/kubernetes sourceName=k8s-all-events
    time="2023-05-31T06:29:19Z" level=info msg="Starting a new stream for \"k8s-controller-events_interactive/true\"."
    time="2023-05-31T06:29:19Z" level=info msg="Start source streaming..." pluginName=botkube/kubernetes sourceName=k8s-controller-events
    time="2023-05-31T06:29:19Z" level=info msg="Starting a new stream for \"k8s-worker-events_interactive/true\"."
    time="2023-05-31T06:29:19Z" level=info msg="Starting controller..." component=Controller
    time="2023-05-31T06:29:19Z" level=info msg="Sending welcome message..." component=Controller
    time="2023-05-31T06:29:19Z" level=debug msg="time=\"2023-05-31T06:29:19Z\" level=info msg=\"Registering filter \\\"ObjectAnnotationChecker\\\" (enabled: true)...\" component=\"Filter Engine\"" logger=stdout plugin=botkube/kubernetes
    time="2023-05-31T06:29:19Z" level=debug msg="time=\"2023-05-31T06:29:19Z\" level=info msg=\"Registering filter \\\"NodeEventsChecker\\\" (enabled: true)...\" component=\"Filter Engine\"" logger=stdout plugin=botkube/kubernetes
    time="2023-05-31T06:29:19Z" level=debug msg="time=\"2023-05-31T06:29:19Z\" level=info msg=\"Registering filter \\\"ObjectAnnotationChecker\\\" (enabled: true)...\" component=\"Filter Engine\"" logger=stdout plugin=botkube/kubernetes
    time="2023-05-31T06:29:19Z" level=debug msg="time=\"2023-05-31T06:29:19Z\" level=info msg=\"Registering filter \\\"NodeEventsChecker\\\" (enabled: true)...\" component=\"Filter Engine\"" logger=stdout plugin=botkube/kubernetes
    time="2023-05-31T06:29:19Z" level=debug msg="time=\"2023-05-31T06:29:19Z\" level=info msg=\"Unable to parse resource: controller.kubeslice.io/v1alpha1/slicerolebindings to register with informer\\n\"" logger=stdout plugin=botkube/kubernetes
    time="2023-05-31T06:29:19Z" level=debug msg="time=\"2023-05-31T06:29:19Z\" level=error msg=\"no matches for controller.kubeslice.io/v1alpha1, Resource=slicerolebindings\" error=\"no matches for controller.kubeslice.io/v1alpha1, Resource=slicerolebindings\" events=\"[create update delete]\"" logger=stdout plugin=botkube/kubernetes
    time="2023-05-31T06:29:20Z" level=info msg="Notified about new release \"v1.0.1\". Finishing..." component="Upgrade Checker"
    time="2023-05-31T06:29:21Z" level=error msg="I0531 06:29:21.155146      33 request.go:682] Waited for 1.197989639s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/api/v1/nodes?limit=500&resourceVersion=0" logger=stderr plugin=botkube/kubernetes
    2023/05/31 06:29:21 rpc error: code = Unavailable desc = error reading from server: EOF
    2023/05/31 06:29:21 rpc error: code = Unavailable desc = error reading from server: EOF
    time="2023-05-31T06:29:21Z" level=error msg="plugin process exited" error="exit status 1" path=/tmp/botkube/source_v1.0.0_kubernetes pid=33 plugin=botkube/kubernetes
    time="2023-05-31T06:29:21Z" level=debug msg="received EOF, stopping recv loop" err="rpc error: code = Unavailable desc = error reading from server: EOF" plugin=botkube/kubernetes subsystem_name=botkube/kubernetes.stdio
    

As you can see, it is a bad UX but can also lead to some serious issues:

  1. If you are watching only for error, you may think that everything is OK, but you are simply not alerted by Botkube which is bad.
  2. You need to figure out that k8s alerts doesn't work by yourself
  3. The logs are not helpful either.

The root cause is that the K8s plugins tries to set up a watch loop, and it fails as CRDs are not recognized by API server. Here is the code that causes that:

if err != nil {
exitOnError(err, s.logger.WithFields(logrus.Fields{
"events": []config.EventType{
config.CreateEvent,
config.UpdateEvent,
config.DeleteEvent,
},
"error": err.Error(),
}))
}

Of course, there can be any other issues like API Server is temporarily unavailable, etc. However, in this particular issue, we can focus only on ignoring/filtering out unknown resources and improving notification to include information about started processes.

Possible resolution

  • ignore unknown CRDs
  • continue to watch for all other that are valid
  • post a message on Slack about possible misconfiguration and details about which CRDs were ignored with a given reason (e.g. not recognized by api server)
  • expose and option to check the source status, it can be still via @Botkube list sources

Later implement #878.

@pkosiec pkosiec added this to the v1.4.0 milestone Aug 16, 2023
@pkosiec
Copy link
Member

pkosiec commented Aug 16, 2023

Will be addressed in #878, as such invalid configuration will cause plugin restart.

@pkosiec pkosiec closed this as completed Aug 16, 2023
@pkosiec pkosiec closed this as not planned Won't fix, can't repro, duplicate, stale Aug 16, 2023
@pkosiec pkosiec removed this from the v1.4.0 milestone Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-triage Relates to issues that should be refined
Projects
Status: Done
Development

No branches or pull requests

3 participants