Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take steps to mitigate cascading failure #17

Open
orangejulius opened this issue Mar 24, 2018 · 1 comment
Open

Take steps to mitigate cascading failure #17

orangejulius opened this issue Mar 24, 2018 · 1 comment

Comments

@orangejulius
Copy link
Member

orangejulius commented Mar 24, 2018

This package instructs superagent to perform retries of HTTP requests that fail. This is good for fixing issues such as random network glitches, etc. However, if not done carefully it can cause chain reactions that completely break systems operating close to the edge of full capacity.

We should ensure superagent backs off exponentially, with some random jitter added, between request retries.

Without that, a heavily loaded server that does not respond within the timeout will be bombarded with more and more requests in the form of retries. This further increases the load and reduces its capacity to return correct results.

The jitter is necessary because its possible for retry requests to "sync up" through various means, which can cause a normally tolerable load on a downstream server to become too much to handle.

This may require either forking superagent or going back to our own retry code.

For more info see the excellent Addressing Cascading Failures section of the Google SRE book.

@missinglink
Copy link
Member

There is probably already a module which does what you're looking for.

What about this one? https://www.npmjs.com/package/superagent-retry-delay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants