Take steps to mitigate cascading failure #17

orangejulius · 2018-03-24T19:14:27Z

This package instructs superagent to perform retries of HTTP requests that fail. This is good for fixing issues such as random network glitches, etc. However, if not done carefully it can cause chain reactions that completely break systems operating close to the edge of full capacity.

We should ensure superagent backs off exponentially, with some random jitter added, between request retries.

Without that, a heavily loaded server that does not respond within the timeout will be bombarded with more and more requests in the form of retries. This further increases the load and reduces its capacity to return correct results.

The jitter is necessary because its possible for retry requests to "sync up" through various means, which can cause a normally tolerable load on a downstream server to become too much to handle.

This may require either forking superagent or going back to our own retry code.

For more info see the excellent Addressing Cascading Failures section of the Google SRE book.

missinglink · 2018-03-25T15:14:40Z

There is probably already a module which does what you're looking for.

What about this one? https://www.npmjs.com/package/superagent-retry-delay

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take steps to mitigate cascading failure #17

Take steps to mitigate cascading failure #17

orangejulius commented Mar 24, 2018 •

edited

Loading

missinglink commented Mar 25, 2018

Take steps to mitigate cascading failure #17

Take steps to mitigate cascading failure #17

Comments

orangejulius commented Mar 24, 2018 • edited Loading

missinglink commented Mar 25, 2018

orangejulius commented Mar 24, 2018 •

edited

Loading