Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

Update strategy for dealing with failing plugins #708

Open
andrzej-k opened this issue Feb 11, 2016 · 3 comments
Open

Update strategy for dealing with failing plugins #708

andrzej-k opened this issue Feb 11, 2016 · 3 comments

Comments

@andrzej-k
Copy link
Contributor

Currently if plugin reports an error to the framework the relevant tasks (using that plugin) will eventually be disabled after 10 consecutive plugin errors. This can lead, for example in case of intermittent network issues, to disabling tasks permanently. In order to start tasks again plugin must be unloaded, loaded again and tasks restarted. Possible options to consider:

  1. Make task failing mechanism configurable:
    • allow to disable this mechanism - do not disable task in case of failing plugin ever
    • allow for configurable number of re-tries
  2. Change default behavior (counting errors at each interval) to introduce something like exponential back-off algorithm
  3. Update documentation to make plugin author aware in what cases plugin reporting error to framework may lead to task becoming disabled
  4. Distinguish plugin critical errors that should lead to task being disabled from less severe errors that should start exponential backoff failing algorithm.
  5. Allow task flow designer to decided what to do in case plugin errors
@jcooklin
Copy link
Collaborator

@andrzej-k: In general terms how do you see something like an exponential back-off algorithm working in this situation. Are you inferring that a plugin returning errors could result in adjusting the rate/interval of a task(s).

There relates to #688.

@andrzej-k
Copy link
Contributor Author

@jcooklin: Possibly (to reduce resources utilization and to avoid over-flooding user with errors), but at least user should be able to decide if failing plugin should result in task being disabled. This could be configurable in task manifest (per plugin config), for example "task_failing_strategy":

  • simple (default) - as currently, stop task after x numer of consecutive errors
  • back-off - invoke plugin gradually less often, but keep task alive
  • ignore errors - keep alive despite errors

Tolerable downtime per plugin as a configurable item could be another option.

@IRCody
Copy link
Contributor

IRCody commented Aug 18, 2016

I believe the simple case has been covered by #1127. I think adding a way to ignore errors would be nice. I'm not sure how I feel about the back-off since that seems like it would be modifying the interval (which is part of the task definition).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants