Watch and respawn started plugins that exited #878

mszostok · 2022-12-06T10:17:31Z

Overview

Watch and respawn started plugin sources and executors that exited. Currently, for sources, the Dispatcher service only get a client and starts the streaming process.

botkube/internal/source/dispatcher.go

Lines 42 to 62 in 22da62e

    
           sourceClient, err := d.manager.GetSource(pluginName) 
        
           if err != nil { 
        
           	return fmt.Errorf("while getting source client for %s: %w", pluginName, err) 
        
           } 
        
           out, err := sourceClient.Stream(ctx, pluginConfigs) 
        
           if err != nil { 
        
           	return fmt.Errorf("while opening stream for %s: %w", pluginName, err) 
        
           } 
        
           go func() { 
        
           	for { 
        
           		select { 
        
           		case event := <-out.Output: 
        
           			log.WithField("event", string(event)).Debug("Dispatching received event...") 
        
           			d.dispatch(ctx, event, sources) 
        
           		case <-ctx.Done(): 
        
           			return 
        
           		} 
        
           	} 
        
           }()

However, each plugin is a separate binary that starts as a sub-process. As a result, if there is e.g. panic in a given plugin, it closes this sub-process, and we will not receive new events from this source until the Botkube process will be restarted. Also, executors should be covered too.

We should detect such situations and add a retry mechanism:

It can be as simple as just starting the binary once again.
We can also think about sth similar to what K8s does when the Pod is crashing.
related issue on go-plugin repo: Handling plugin crashes hashicorp/go-plugin#31. We can take a look on how other projects are handling such situation.

Acceptance Criteria

detect crashing plugin and restart them using simple retry mechanism
document implemented approach
e2e test coverage
test executor: update echo plugin: add code to fail with a specific command
consider testing source as well: update config map watcher to fail on a given annotated cfg map, test that it worked after deleting it
have a retry threshold and after X retries, we can give up with a given source/executor restarting and send a message to Slack that such source was deactivated because of constant crashing.
- send message via Bot that a given plugin is crashing and related events may be lost
  - send the message only to the bind channels
- make it configurable for user
  - default strategy: keep running Botkube anyway, but ofc send message
  - strategy: if one plugin is failing, exit Botkube if user prefers that

Reason

Make the source plugin dispatcher more reliable to avoid the situation when our users stop receiving source events because of some random plugin crashes.

The text was updated successfully, but these errors were encountered:

mszostok added enhancement New feature or request needs-triage Relates to issues that should be refined labels Dec 6, 2022

mszostok mentioned this issue Dec 6, 2022

Botkube Plugin System #844

Closed

11 tasks

mszostok changed the title ~~Watch and respawn started plugin source that exited~~ Watch and respawn started plugins that exited Dec 15, 2022

mszostok added this to the v0.18.0 milestone Dec 15, 2022

mszostok mentioned this issue Dec 15, 2022

Missing readiness probe for GKE/Ingress integration #881

Closed

mszostok removed the needs-triage Relates to issues that should be refined label Dec 15, 2022

mszostok mentioned this issue Dec 15, 2022

Return the JSON status object from the healthz endpoint #890

Closed

pkosiec modified the milestone: v0.18.0 Jan 10, 2023

pkosiec removed this from the v0.18.0 milestone Feb 15, 2023

mszostok mentioned this issue May 12, 2023

Add validation that kubeconfig is specified #1063

Merged

mszostok mentioned this issue Jun 1, 2023

Ignore the unknown CRD's #1080

Closed

pkosiec added this to the v1.4.0 milestone Aug 16, 2023

josefkarasek self-assigned this Aug 21, 2023

josefkarasek mentioned this issue Aug 23, 2023

Restart crashed plugins #1204

Closed

pkosiec modified the milestones: v1.4.0, v1.5.0 Sep 12, 2023

josefkarasek mentioned this issue Sep 13, 2023

Show plugin activation status #1256

Merged

josefkarasek mentioned this issue Sep 25, 2023

Describe plugin restart policies kubeshop/botkube-docs#286

Merged

josefkarasek closed this as completed Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watch and respawn started plugins that exited #878

Watch and respawn started plugins that exited #878

mszostok commented Dec 6, 2022 •

edited by pkosiec

Loading

Watch and respawn started plugins that exited #878

Watch and respawn started plugins that exited #878

Comments

mszostok commented Dec 6, 2022 • edited by pkosiec Loading

Overview

Acceptance Criteria

Reason

mszostok commented Dec 6, 2022 •

edited by pkosiec

Loading