You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Watch and respawn started plugin sources and executors that exited. Currently, for sources, the Dispatcher service only get a client and starts the streaming process.
returnfmt.Errorf("while getting source client for %s: %w", pluginName, err)
}
out, err:=sourceClient.Stream(ctx, pluginConfigs)
iferr!=nil {
returnfmt.Errorf("while opening stream for %s: %w", pluginName, err)
}
gofunc() {
for {
select {
caseevent:=<-out.Output:
log.WithField("event", string(event)).Debug("Dispatching received event...")
d.dispatch(ctx, event, sources)
case<-ctx.Done():
return
}
}
}()
However, each plugin is a separate binary that starts as a sub-process. As a result, if there is e.g. panic in a given plugin, it closes this sub-process, and we will not receive new events from this source until the Botkube process will be restarted. Also, executors should be covered too.
We should detect such situations and add a retry mechanism:
It can be as simple as just starting the binary once again.
We can also think about sth similar to what K8s does when the Pod is crashing.
detect crashing plugin and restart them using simple retry mechanism
document implemented approach
e2e test coverage
test executor: update echo plugin: add code to fail with a specific command
consider testing source as well: update config map watcher to fail on a given annotated cfg map, test that it worked after deleting it
have a retry threshold and after X retries, we can give up with a given source/executor restarting and send a message to Slack that such source was deactivated because of constant crashing.
send message via Bot that a given plugin is crashing and related events may be lost
send the message only to the bind channels
make it configurable for user
default strategy: keep running Botkube anyway, but ofc send message
strategy: if one plugin is failing, exit Botkube if user prefers that
Reason
Make the source plugin dispatcher more reliable to avoid the situation when our users stop receiving source events because of some random plugin crashes.
The text was updated successfully, but these errors were encountered:
Overview
Watch and respawn started plugin sources and executors that exited. Currently, for sources, the
Dispatcher
service only get a client and starts the streaming process.botkube/internal/source/dispatcher.go
Lines 42 to 62 in 22da62e
However, each plugin is a separate binary that starts as a sub-process. As a result, if there is e.g. panic in a given plugin, it closes this sub-process, and we will not receive new events from this source until the Botkube process will be restarted. Also, executors should be covered too.
We should detect such situations and add a retry mechanism:
It can be as simple as just starting the binary once again.
We can also think about sth similar to what K8s does when the Pod is crashing.
related issue on go-plugin repo: Handling plugin crashes hashicorp/go-plugin#31. We can take a look on how other projects are handling such situation.
Acceptance Criteria
Reason
Make the source plugin dispatcher more reliable to avoid the situation when our users stop receiving source events because of some random plugin crashes.
The text was updated successfully, but these errors were encountered: