Update strategy for dealing with failing plugins #708

andrzej-k · 2016-02-11T15:31:51Z

Currently if plugin reports an error to the framework the relevant tasks (using that plugin) will eventually be disabled after 10 consecutive plugin errors. This can lead, for example in case of intermittent network issues, to disabling tasks permanently. In order to start tasks again plugin must be unloaded, loaded again and tasks restarted. Possible options to consider:

Make task failing mechanism configurable:
- allow to disable this mechanism - do not disable task in case of failing plugin ever
- allow for configurable number of re-tries
Change default behavior (counting errors at each interval) to introduce something like exponential back-off algorithm
Update documentation to make plugin author aware in what cases plugin reporting error to framework may lead to task becoming disabled
Distinguish plugin critical errors that should lead to task being disabled from less severe errors that should start exponential backoff failing algorithm.
Allow task flow designer to decided what to do in case plugin errors

jcooklin · 2016-02-12T15:01:48Z

@andrzej-k: In general terms how do you see something like an exponential back-off algorithm working in this situation. Are you inferring that a plugin returning errors could result in adjusting the rate/interval of a task(s).

There relates to #688.

andrzej-k · 2016-02-15T14:09:24Z

@jcooklin: Possibly (to reduce resources utilization and to avoid over-flooding user with errors), but at least user should be able to decide if failing plugin should result in task being disabled. This could be configurable in task manifest (per plugin config), for example "task_failing_strategy":

simple (default) - as currently, stop task after x numer of consecutive errors
back-off - invoke plugin gradually less often, but keep task alive
ignore errors - keep alive despite errors

Tolerable downtime per plugin as a configurable item could be another option.

IRCody · 2016-08-18T19:55:45Z

I believe the simple case has been covered by #1127. I think adding a way to ignore errors would be nice. I'm not sure how I feel about the back-off since that seems like it would be modifying the interval (which is part of the task definition).

mbbroberg added type/feature-or-enhancement type/rfc labels Mar 8, 2016

snapbot added the tracked label Jul 9, 2016

IRCody mentioned this issue Aug 18, 2016

Add support for auto restarting disabled tasks #935

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update strategy for dealing with failing plugins #708

Update strategy for dealing with failing plugins #708

andrzej-k commented Feb 11, 2016

jcooklin commented Feb 12, 2016

andrzej-k commented Feb 15, 2016

IRCody commented Aug 18, 2016 •

edited

Loading

Update strategy for dealing with failing plugins #708

Update strategy for dealing with failing plugins #708

Comments

andrzej-k commented Feb 11, 2016

jcooklin commented Feb 12, 2016

andrzej-k commented Feb 15, 2016

IRCody commented Aug 18, 2016 • edited Loading

IRCody commented Aug 18, 2016 •

edited

Loading