Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add jitter to retry for autoendpoint #318

Closed
jrconlin opened this issue Jun 27, 2022 · 4 comments · Fixed by #319
Closed

Add jitter to retry for autoendpoint #318

jrconlin opened this issue Jun 27, 2022 · 4 comments · Fixed by #319
Assignees

Comments

@jrconlin
Copy link
Member

We're seeing a spike of ddb throttling.

image

The current theory is that this is due to autoendpoint hitting DDB more aggressively per client than the python version did.

Add a jitter to the endpoints retry to more evenly distribute retrys.

@jrconlin jrconlin self-assigned this Jun 27, 2022
jrconlin added a commit that referenced this issue Jun 27, 2022
@jrconlin
Copy link
Member Author

Note: This issue appears to be more of a problem for the endpoint than the connection server. To that end, I've only added the 'jitter' feature to endpoint.

For the modernization effort, which will use the same database method as endpoint, I'll add the 'jitter' as a common feature.

@pjenvey
Copy link
Member

pjenvey commented Jun 27, 2022

AFAICT autoendpoint python doesn't retry any db calls, whereas the connection server retries many of its calls.

The rust connection server added retry logic on the db calls themselves, so with autoendpoint-rs "sharing" db calls (it currently uses reimplementations of them that also include retry logic, but they'll be actually shared one day) it's now retrying whereas it didn't before.

@pjenvey
Copy link
Member

pjenvey commented Jun 27, 2022

The connection db calls use futures_backoff whereas autoendpoint's using again crates for retry logic. These both work similarly and default to 1 second backoff delays, whereas autoendpoint overrides that default to use 100ms instead. They both use the exponential backoff policy which increases the delay length for every retry.

@jrconlin
Copy link
Member Author

Thanks. I was thinking that adding the jitter might help distribute retry load a bit more. I can't really see a clear correlation between traffic and throttling events although the general pattern is there (more events when there's more traffic in general). My dumb idea is that what might be happening is a bit of synchronicity between the retries and future traffic causing a sudden, sub-second spike up.

jrconlin added a commit that referenced this issue Jun 28, 2022
* bug: add jitter to retry

Closes: #318
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants